## Practical MLAA on the GPU - My take

In days where FXAA seems to be the anti-aliasing method of choice, due to it being easy to implement and fast to execute. I'm going to take a look at how the "French MLAA paper" (Morphological Antialiasing on GPU) approached anti-aliasing. This has two reasons: first and foremost it's because the paper uses a *Recursive Doubling* approach to implement an iterative gather algorithm in shaders (as 'popularized' by Interactive Summed-Area Table Generation for Glossy Environmental Re?ections). And second, because the code is quite hard to understand if you don't know any French (I don't, and because of this I might have made some mistakes; bear with me).

# Overview

The algorithm consists of 4 different steps implemented in 5 different shaders to accomplish what Crytek calls a *fancy edge blur*. I'll outline the steps first and then go into detail about what each part does.

- Edge detection
- Count line lengths of log
_{4}(maxLineLength) passes - Determine blend weights
- Blur (I won't go into this here because it's a simple blur filter based on the previously calculated weights)

# 1. Edge detection

Although there are many ways to do edge detection in a pixel shader, the paper decided to implement this using a difference based on color, it makes sense to do this because blurring edges based on normal/depth. And as it turns the most recent version at the time of writing also supports color + depth edge detection. Currently the shader is implemented by converting colors to LAB color space and calculating a Euclidian distance between two colors and optionally do a depth compare; when this distance exceeds a certain threshold you have your edge.

The paper uses and outputs to two textures, a mask and the texture used to count line lengths and they are laid out like this:

- The mask consists of 2 channels:
- R gets a 1 if there is a
**horizontal**edge (zero otherwise) - G gets a 1 if there is a
**vertical**edge (zero otherwise) - The line length texture consists of 4 channels:
- B and A get a value of 1/255 if there is a
**horizontal**discrepancy (zero otherwise) - R and G get a value of 1/255 if there is a
**vertical**discrepancy (zero otherwise)- If there is a horizontal discrepancy, this means that there is a vertical line and vice versa! Keep this in mind for the Count line lengths shader.

- B and A get a value of 1/255 if there is a

- R gets a 1 if there is a

For my implementation I dropped the mask because the line length texture can serve the exact same purpose, except that it's data is in different channels and checks of equals 1 should be converted to doesn't equal 0. This made the flow of the algorithm a lot easier and saved some memory.

# 2. Count line lengths

The process of determining the line lengths uses a technique called *recursive doubling* which is the reason this part of the process gets it's logarithmic runtime performance of O(log_{4}(maxLineLength)) this basically comes down to doing 4 passes for a maximum length of 256 pixels. To see how this works we should first see exactly how the line-length buffer is structured.

Basically, the R channel stores how many pixels a certain line has to the left of it and the G channel stores how many there are to the right; this means that for each pixel you can look up the length of the edge by summing up either the R and G channels for horizontal length and the R and G channels for vertical length.

Orange represents the alpha channel. This is the content of the buffer when the algorithm is done processing.

The gist of the algorithm is pretty simple and I'll go over it briefly. Just keep in mind that the following loop is done per channel (eg. 4 times).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | // PreviousLengths is initialized to 1.f/255.f for edges and 0 for non-edges float4 currentLengths = tex2D(PreviousLengths, Tex); float4 currentDelta = currentLengths * PixelSize.zzww; const float Threshold = Level / 255.f; if(currentLengths.r >= Threshold) { float2 newTex = Tex - float2(currentDelta.r, 0); for(int k = 0; k < 3; k++) { float oneDelta = tex2D(PreviousLengths, newTex).r; currentLengths.r += oneDelta; newTex.x -= oneDelta * PixelSize.z; } } |

For the first pass Level is initialized to 1 (then to 4, 16 and 64) so this check only does the length count only if it thinks it should still be counting the lengths of the edges. PixelSize is initialized to (1/width, 1/height, 255/width, 255/height) so when multiplying by zzww we convert between increments in 1/255th to increments in 1/width and 1/height.

The interesting part, however, is inside the loop as it moves more pixels to the side depending on the value in the R channel it retrieves. This has a effect that if the value at that pixel is 0, nothing changes and currentLengths doesn't get incremented.

The different colors do not indicate different channels, they are merely different lines.

When doing normal point-sampling when you reach 0 you'll know the line has ended and the loop makes sure you stop there. Hower; as shown by Nicolas Vizerie in MLAA (MorphoLogical AntiAliasing) on the GPU using Direct3D9.0 using bilinear filtering can reduce the amount of texture fetches by testing two lines at a time.

# 3. Determine blend weights

The blend weights are calculated from a pre-generated lookup table (look for tabAires in the source-code, I didn't bother to re-implement it). However, the basic gist of the table is that on the vertical axis is the size (eg. the sum of two channels in the LineLength texture) of the line and on the horizontal is the size in one direction (eg. one of the two channels in the LineLength texture). The content of the table are the areas below the triangles that the edges form and they are almost all handled by the formula `0.5 * (1. - (2 * j + 1) / (float)S)`

the other formulas in the lookup table are there to help in edge cases where the equation tails off. Each pixel in the lookup table has a range of 0.25 ? pixelvalue ? 0.5 because the blending of a certain pixel has a maximum contribution of four other pixels.

The shader, although lengthy has quite a straightforward implementation that basically checks the endpoints of the lines to see if there is exists an edge orthogonal to it. If that's the case, it uses the shortest of the two line-segments and the total size of the original line to determine the weights in the lookup table. This process is done four times:

- Horizontal for current pixel
- Vertical for current pixel
- Horizontal for one pixel to the left
- Vertical for one pixel above

# Further references

- MLAA Demo 2 by @synulation Optimized DX10 version of the previous version of the shader.
- Practical Morphological Anti-Aliasing Other MLAA on the GPU technique.
- MLAA (MorphoLogical AntiAliasing) on the GPU using Direct3D9.0 Another MLAA implementation (also uses the old version of the paper).
- Morphological Antialiasing on GPU

- Interactive Summed-Area Table Generation for Glossy Environmental Re?ections