Elite Bastards - Shader Model 3.0 - What's it all about? - "We find it offensive that you find it offensive."

Home
Forums
Reviews
Commentary
Tweaks/Guides
Previews & WP
Retro Hardware
EB's R.O.A.R.
RSS/Syndication
Device Drivers
Program/Utility
SETI@home
Contacts
Privacy
Links

Friends of EB

• Our users have posted a total of 156126 posts.
• We have 2659 registered bastards.
• The newest registered bastard is devo22.
• There are a total of 28 bastards online.

IRC Channel
• Server •
elitebastards.com
• Port •
6667
• Channel •
#main
• Join Main Channel •
Client - Web

Search EB

Special Links

EB is built on and .

Advertise with us.

Site stats are . Forum stats are .

Counter started 9/3/02. Main site pages only.

Copyright © 2002-04
Elite Bastards

All rights reserved.
All trademarks are properties of their respective owners.

User Panel

Welcome Elite Guest.

• Click here to Register (create account) with Elite Bastards and hang out in our laid back community.

Registration is free!

Recent Headlines

Recent Discussion

PenStar Systems on R520...

ATI CATALYST 5.8 releas...

Mid-week news round-up

NVIDIA silently announc...

Shuttle SN26P with SLI

ATI CATALYST 5.8 driver...

Elite Bastards review: ...

More R5xxx range leakag...

Blowing caps?

Oblivion Oblivion Obliv...

Puppy escape escapades.

The EB Naruto Thread (a...

More R5xxx range leakag...

my new x800pro/ will it...

Your favorite distro?

Full Metal Panic: TSR

With anticipation for ATi and nVidia's next-generation video cards reaching fever pitch, and speculation running riot as to what features these GPUs will support, one question that is emerging time and again has been - What's the difference between 2.0 and 3.0 shaders?

Of course, to give a full answer to this question would be a long, drawn-out and overly technical process (Not to mention the fact that chances are we won't see any real-world evidence of what can be done with its added features for some time). However, it is worth investigating what basic features this new shader model will give us, as well as examining what it could mean from both a performance and image quality perspective.

So, without further ado, let's take a look at what the future of shaders has to offer...

Pixel Shader 3.0

To compare Pixel Shader 3.0 to its 2.0 predecessor becomes complicated before we even begin, simply due to the fact that PS 2.0 exists in two forms - The basic 2.0 standard, and 2.0 Extended (as can be seen in the GeForceFX line of GPUs). The differences between 2.0 and 2.0 Extended are many and varied, but let's begin by looking at the shader instruction count available in these different shader versions:

Version	Minimum instruction count	Maximum instruction count
1.1	8	8
1.2 / 1.3	12	12
1.4	14	14
2.0	64 arithmetic, 32 texture	64 arithmetic, 32 texture
2.0 Extended	96	512
3.0	512	32,768

For reference, I've also opted to show the instruction count for all 1.x level shaders as well - This shows what a leap the move to 2.0 shaders was in this respect. As far as Pixel Shader 3.0 goes, the most striking aspect is the minimum instruction count, which jumps way up from the 96 instructions specified in the 2.0 Extended spec all the way up to 512 instructions. How fast a shader with that many instructions would actually run is open to debate (not to mention how powerful the hardware is), but this increase certainly adds to the flexibility on offer. To complicate things further, the ps_2_b profile has also been added to the DirectX 9 HLSL - This gives all the features of Pixel Shader 2.0, but adds in support for up to 512 instructions without the other advantages 2.0 Extended has to offer. It seems likely, given what we know, that this profile is designed mainly to make use of ATi's upcoming R420s abilities. It is also worth noting here that the minimum instruction count defines the amount of instructions the hardware must be capable of to be called compliant for that shader model, rather than the minimum amount of instructions a shader has to use. This is true of both pixel and vertex shaders.

Away from the simple instruction count, we come to what could be considered the real meat of what Pixel Shader 3.0 offers. To be honest, from a gamer's point of view these extra features aren't included to add to image quality, but rather to make the lives of developers easier, and give them the greater flexibility and power that they yearn for. Again, a lot of these features were in fact included to in the 2.0 Extended specification, but are now also available in PS 3.0. Before going into these in any detail, lets take a brief look at what the main new features are on offer are:

- Static and dynamic flow control
- Predication
- Dynamic branching
- New registers, and more temporary register usage allowed
- New gradient/texture instructions
- Centroid sampling
- Arbitrary swizzle

Centroid Sampling

It is probably worth starting with the feature listed here which became one of the enthusiast community's buzzwords last Summer - Centroid Sampling. I'm sure everybody remembers Valve's "Shader Day" presentation, where they stated that anti-aliasing would only have any chance of working in Half-Life 2 on ATi's current hardware due to its support of centroid sampling in hardware, which GeForceFX boards lack. Of course, as time went by Valve got AA working on both ATi and nVidia cards, and the whole thing was forgotten. But now, with the introduction of Pixel Shader 3.0, we see Centroid Sampling finally becoming available (and exposed for use, even on compliant 2.0 shader hardware) in DirectX 9.0c. So, what exactly is Centroid Sampling?

To start with, you need to look at the problem Valve were facing with Half-Life 2 - Namely, that multisample antialiasing takes its samples from the center of a pixel which can, in certain circumstances (particularly when light maps are used) mean that the wrong texel may be sampled, thus causing artifacts. This is actually a subtle yet surprisingly common problem, that can be seen (if you look hard enough) in a fair few games with MSAA enabled.

Centroid Sampling solves this issue by making sure that texture samples are always taken within the triangle being outputted, thus ensuring that the samples taken are always correct and so avoiding any unsightly artifacts.

Dynamic flow control / Dynamic branching

One of the buzzwords that surrounded the launch of the GeForceFX was 'dynamic flow control' - Part of the PS 2.0 Extended spec, it may begin to get more attention now that 3.0 shaders are upon us. The use of dynamic flow control is mainly as a vehicle to gain performance when executing shaders - In theory, this should prevent unnecessary execution of shaders, as well as cutting down the lengths of shaders that a GPU has to run by skipping instructions that aren't required. It also allows for more flexibility and makes life more convenient from the developer's perspective.

Dynamic branching follows this similar path of making shaders more versatile, while attempting to improve performance. For example, say a developer wanted to handle different numbers of lights via shaders, he would normally have to write a shader for each number of lights he wanted to use. With dynamic branching however, the developer only needs to write a single shader along with an option to select how many lights are in the scene each time. Obviously, this can potentially save a lot of development time.

At present however, there seem to be two conflicting schools of thought as to exactly how useful dynamic branching will be, particularly from the aforementioned performance standpoint - While some see it as a very useful addition to the specification, others have questioned just how well branching will perform as part of the graphics pipeline, believing that the way data is handled is unsuitable for the concept of branching. The proof, no doubt, will be in the pudding.

Predication

Predication should really sit alongside dynamic flow control and (particularly) dynamic branching, as it is another function designed to speed up processing of shaders. The basic premise of predication is that when there are multiple possibilities as to which piece of data will be required in a given situation, rather than trying to guess which piece of data to send to the GPU (i.e. prediction) all the pieces of data will be processed, and then only the correct data that is required is sent on through the pipeline. Although processing all the possible outcomes is computationally expensive, the theory is that this will still be more efficient than guessing what data will be required. As with dynamic branching, there is plenty of discussion among the experts as to the use of predication in a GPU situation, so again it will be a case of wait and see when it comes to its performance when implemented (if it gets used a great deal at all).

Arbitrary swizzle

Without going into too much technical detail on this one, arbitrary swizzle is what is known as a 'modifier', and is used to modify instructions and registers. Its use is mainly to reduce the number of instructions used in a shader, thus making it more efficient.

What does this mean for gamers?

Until we start seeing titles explicitly written to take advantage of Pixel Shader 3.0 features, it really won't make any difference at all. We've already seen one title (Far Cry) patched to include 3.0 shader support - This may well have been recompiled to take advantage of some of the features listed above, and although it is unlikely to make a difference to image quality at all, we may see performance improvements using PS 3.0 over 2.0 - Something to test once we get some 3.0 shader capable boards in our hands...

The feature set of Pixel Shader 3.0 does also allow for some nice new visual effects of note - Mainly the ability to use floating point values for texture filtering and frame buffer blending. In particular, frame buffer blending can allow for a realistic looking 'motion blur' effect (brings back memories of 3Dfx's T-Buffer, doesn't it?), and floating point textures should improve the look of effects such as HDR (High Dynamic Range) rendering - An effect that we've already seen on PS 2.0 hardware in Half-Life 2, now taken to a new level.

So, to sum up, PS 3.0 is definitely a big step forward from a hardware and developer point of view, while still offering some interesting new effects for all the eye-candy lovers. As with all new graphics technology, the real question marks are over how much (and how quickly) the features will be brought into use, and what the exact performance ramifactions will be - There is a possibility that the first generation of 3.0 shader compliant hardware will be nothing more than a 'checkbox feature' if performance isn't good enough. Again, we shall have to wait and see.

Now, with Pixel Shader 3.0 out of the way, lets move on to arguably the more exciting of the two shader models from an enhancement point of view...

Vertex Shader 3.0

As with before, let's begin by taking a look at the maximum instruction count the various Vertex Shader revisions support:

Version	Maximum instruction count
1.1	128
2.0	256
2.0 Extended	256 (can be higher using looping)
3.0	512 minimum

As you can see, Vertex Shader 3.0 brings the instruction level up from a maximum of 256 instructions in the 2.0 spec to a minimum of 512 instructions. There is no predefined maximum in the 3.0 specification, this is instead set by the maximum number of instructions specified in the hardware itself.

Now, let's take a look at what Vertex Shader 3.0 offers from a feature point of view:

- Static and dynamic flow control
- Predication
- Dynamic branching
- Indexing registers, and more temporary register usage allowed
- Vertex textures
- Vertex stream frequency

As you've probably already noticed, many of these features are the same as those I've just covered for PS 3.0, so the same rule applies to the explanations given earlier, the concepts are basically the same.

This leaves us with possibly the biggest single leap that the 3.0 shader model offers us over its predecessors:

Vertex textures

The premise of vertex textures is actually quite simple. In the past, vertex shaders have had no way of accessing textures. This has now changed, as a function has been added to allow vertex shaders to do texture lookups. This greatly enhances the number of things you can do with vertex shaders, and once used may well end up with some very nice effects for users.

One potential usage for this feature is the ability to do full displacement mapping. Displacement mapping was a highly-touted feature of DirectX 9, so it was somewhat disappointing when the limitations of current hardware when it came to this feature were established. Currently available hardware (Matrox Parhelia aside) can only use pre-created displacement maps, whereas this new functionality should assist in creating fully programmable displacement maps.

Another great bonus of being able to read textures from the vertex shader will be the ability to create more realistic physics simulations. Being able to use textures and vertex shaders in this new way will be particularly useful for simulating objects like water or cloth in a more realistic manner.

Vertex stream frequency

Vertex stream frequency division is again a feature designed to make things more efficient. Normally, with older shader revisions, the vertex shader is called once per vertex, and then all the input registers are initialised. However, vertex stream frequency can be used to ensure that applicable registers are called at a less frequent rate, thus saving work and theoretically improving performance.

What does this mean for gamers?

As we've explored, the potential for 'cool eye candy' is generally greater with the improvements made in Vertex Shader 3.0 than those seen in Pixel Shader 3.0. Certainly, if some of these features are put into effect in future titles, and performance is up to scratch, then we have something to be excited about.

And that pretty much concludes this look at the near future of shaders. Obviously, over the coming months there are many debates to be had (both technical and otherwise) over the merits and uses of 3.0 shaders, particularly once we find out for certain which features next-generation hardware does and doesn't support (and perhaps more importantly, how fast next-generation hardware can run these features). The next twelve to eighteen months, as were the eighteen months just past, should prove to be very interesting....

Come get some emails here!

EB Poll

What should Microsoft have named their next Operating System?