Cort Stratton is a senior programmer on Sony's ICE Team, based at Naughty Dog. He has contributed SPU rendering code to dozens of PS3 titles, including Uncharted 2, God of War 3, Bioshock, Grand Theft Auto 4 and Portal 2. Cort is also a part-time faculty member at the Art Institute of Santa Monica. He tweets, from time to time, as @postgoodism.
Posts by Cort Stratton
  1. The Games Programmers Play ( Counting comments... )
  2. An interesting vertex shader trick ( Counting comments... )
  3. Round-Robin Programming ( Counting comments... )
  4. Is your game re-approachable? ( Counting comments... )
  5. Your Argument Is Invalid: Improving OpenGL error messages ( Counting comments... )
Technology/ Code /

I almost let today slip by without posting; instead, here's a cute hack.

Ode to an Unsung Hero

Much has been written (on #AltDevBlogADay and elsewhere) about Timothy Lottes' Fast Approximate Anti-Aliasing algorithm (FXAA). Naturally, most of the attention focuses on the pixel shader, but what about the poor neglected vertex shader? Sure, it does almost nothing, but it does it so elegantly; let's give it some love! Here's the code (slightly simplified and reformatted):

struct VS_Output
    float4 Pos : SV_POSITION;              
    float2 Tex : TEXCOORD0;
VS_Output VS(uint id : SV_VertexID)
    VS_Output Output;
    Output.Tex = float2((id << 1) & 2, id & 2);
    Output.Pos = float4(Output.Tex * float2(2,-2) + float2(-1,1), 0, 1);
    return Output;

What does it do? Well, let's pretend we've bound the shader and are using it to draw a single triangle:

pImmediateContext->VSSetShader(fxaaVS, ..., ...);
pImmediateContext->Draw(3, 0);

The shader's only input is the vertex ID, a system-generated value that starts at zero and increases for every new vertex. So, the shader will be invoked three times with IDs 0, 1, and 2. This produces the following output:
ID=0 -> Pos=[-1,-1], Tex=[0,0]
ID=1 -> Pos=[ 3,-1], Tex=[2,0]
ID=2 -> Pos=[-1,-3], Tex=[0,2]

If we clip the resulting triangle into homogenous clip space (-1..1 along each axis), we see that it just barely fills the XY plane, and that the texture coordinates range from [0,0] in the upper-left corner to [1,1] in the lower-right. Aha, a full-screen quad!

So what? Full-screen quads are easy! What makes this one interesting is that the shader is completely self-contained, with zero dependencies on external data. No vertex buffer, no index buffer, no constant buffers, no transformation matrices, and no unusual render state. Just bind the shader, draw a single triangle using auto-generated indices, and you're done. It doesn't even rely on any fancy HLSL-specific language features, so you could easily concoct a similar shader in Cg/GLSL/etc. Neat! It's almost as easy as the old immediate-mode days, before every draw call had to wrangle multiple buffers in GPU-mapped memory.

Digging (Arguably) Too Much Deeper

Clean though it may be, the shader is missing one very useful feature: all but the simplest full-screen shaders need the viewport resolution (or more specifically, the inverse resolution) so that neighboring texels can be sampled. These values could very easily be hard-coded or passed as a shader constant and then stashed into the Z and W coordinates of the output UVs, but in doing so you'd tarnish an otherwise self-sufficient piece of code. Is there another way?

Well, the vertex ID is a full 32-bit integer, and we're only using the lowest two bits. The D3D11 Draw() function allows you to specify the ID of the first vertex to draw. Since we're already not using the ID to index into memory, why not pack the viewport's width and height into the high 30 bits?

pImmediateContext->Draw(3, ((viewWidth&0x7FFF)<<17) | ((viewHeight&0x7FFF)<<2) | 0);

Unfortunately, this doesn't work. No matter what starting index you pass, the SV_VertexID you see in HLSL always starts at zero. There's probably a damned good reason for this, but for now it's ruined my day. There's also the possibility that you'll run into an over-eager graphics driver that will try to pre-fetch some ridiculously far-away vertex even though you're not using it, and probably end up triggering a segmentation fault.

There is another option: in the pixel shader, we could use the ddx()/ddy() functions to compute the screen-space partial derivative of the incoming texture coordinates. Since our UVs range from 0 to 1 across the screen, the partial derivative in each axis is exactly the inverse viewport resolution! But now you're doing an extra bit of completely redundant ALU work in every single invocation of the pixel shader; on one hand, ALU is cheap (and getting cheaper), but most likely this is still too steep a price to pay for a bit of extra elegance and CPU-side simplicity.

[Update: As readers have pointed out in the comments, HLSL Shader Model 4+ provides a mechanism for querying the dimensions of a texture (via the GetDimensions() method), as well as applying an integer offset to texture reads (via the optional offset parameter to Sample()). In many full-screen passes, the source texture has the same dimensions as the destination viewport; if so, you're golden!]

That's it; nothing earth-shattering, just a clever little nugget.