## The Rendering Equation - A Pictorial Introduction

As I hinted at from my last post, I was initially planning on discussing how to isolate various terms in the rendering equation. The idea being that we could compare results to equivalent terms in our runtime renderers to ensure we're in the ballpark in terms of physical correctness. After I began writing this post, my scope started to drift and it morphed into what we have here. I decided I'd run with it.

The basic premise I had is to create a simple scene in pbrt and then try and reproduce the same scene via a runtime renderer to hopefully give some insights on what goes into synthesizing a scene and how it relates to the actual rendering equation (*). This post is mainly aimed at budding graphics programmers and interested non-graphics programmers in that the concepts are quite elementary and should be familiar to most. I've already mentioned the rendering equation a couple of times but don't let that deter you; I will try and keep things on an intuitive level rather than an academic, theoretical level. If I succeed then hopefully most of you will still be reading by the time we get to the end of the post. For those who find themselves wandering off, I've added lots of pictures to aid the descriptions. In fact, I'm not even going to include the actual rendering equation in this whole post.

So let's get started. I created an image of a test scene in pbrt using its direct lighting integrator, i.e. indirect light isn’t simulated. It's a pretty simple scene composed of a ground plane, the ubiquitous Stanford Dragon, and a stack of spheres colored pure red, green, blue, and white, bottom to top, respectively. The environment map is from the very excellent sIBL archive. The scene is deliberately kept very simple in terms of lighting in that all lighting is assumed to be infinitely far away, and therefore can be approximated with directional lights only, which we'll take advantage of later.

(*) I'm assuming the hemispherical formulation of the rendering equation throughout, and more specifically the subset of it which RTR calls the *reflectance equation*.

*Fig 1. pbrt target image*

## Directives for direction

Let's examine what a typical runtime renderer is doing. Whether you're using forward lighting, deferred lighting, or deferred shading is an implementation detail at this point, and extraneous to this discussion. Conceptually what's going on is something like so.

* *

Color TraditionalRuntimeComputePixel() { Color col = black; for(int i = 0; i < num_lights; ++i) col += SurfaceReflectance() * light_color[i] * Dot(N, L); return col; } |

* *

Where L is the light direction and N is the surface normal. Here is the rendering equation based version:

Color RenderEqnComputePixel() { Color col = black; for(int i = 0; i < num_directions; ++i) col += SurfaceReflectance() * IncomingRadiance() * Dot(N, L) * solid_angle; return col; } |

I've included these snippets here as additional evidence of what we already intuitively know... that since light is additive, the more lights we use, the less bright each one should be if we're trying to recreate a known lighting environment. That is, light_color = IncomingRadiance() * solid_angle. The more lights we use, the smaller solid angle each one covers, and the dimmer the resultant light color should be.

Let's try a very simplistic runtime render to start off with. We can replace the SurfaceReflectance function with a simple constant term (a.k.a. Lambertian reflection). Next, let’s try and inject some lighting. What’s the simplest way we could do this? Well since our pbrt scene only consists of infinitely distant light, one approach would be to decompose the environment map into a finite set of directional lights which we can then feed into our runtime renderer. One basic way of doing this would be to take a cubemap and create a new directional light for each texel, pointing in the opposite direction to that texel. (We'd want to weight each light color by the solid angle of the texel it came from.) Although this is conceptually simple, it's going to generate an awful lot of lights, and as we'll see, only a small number are really needed. Here is an application which can decompose a HDR environment map into a user defined number of directional lights. With only 4 lights we get something like this:

Fig 2. diffuse term with 4 lights

Upping to 8 directional lights, we get the image below. Notice that the incoming light is sufficient low frequency that it only made a slight difference when jumping from 4 to 8 lights. I've omitted the 16 light version here, but jumping to 16 lights makes very little visible difference to the diffuse term at all.

Fig 3. diffuse term with 8 lights

## Making visibility visible

So now we've injected some lighting information into the scene but we're still quite a bit off. Next let's take a look at light visibility. In the above image, we're arbitrarily overestimating the light hitting surfaces since there is nothing stopping light from traveling right through geometry in the scene. What we need is a way to block out occluded light from each light source. In the offline world, a simple visibility approximation can be computed by casting multiple rays for each point over a hemisphere, computing the percentage of rays which don't collide with other geometry, and using that to modulate the final sum of the incoming light. This is also known as ambient occlusion.

Here is an fun experiment for all you folks working on real-time renderers out there. It produces some nice shots as well being instructive as to how we can go about producing more realistic renders. One of the advantages to keeping the environment light described in terms of a set of directional lights is that we can compute an approximation to the visibility term for all pixels affected by a single light simultaneously on the GPU by simply rendering a shadow map. If the shadow map determines that the pixel we are shading is in shadow from that particular directional light, we don't add in its contribution. Running through the same process for each direction results in a shadow map for each light. Here is what 4 shadow casting lights look like:

Fig 4. results of 4 shadow casting lights

Not so hot. The shadow function here is a very simple *compare one depth to another and discard* type function. More sophisticated statistically based algorithms such as variance or exponential shadowing could help a lot but for now let’s just take what we have and blur the $#!* out of it. Applying a brute force multi-tap PCF technique to the same 4 lights results in something like this:

Fig 5. results of 4 filtered shadow casting lights

Wow, that's a big difference, much better. The more lights we decompose the environment into, the closer we converge to the target image. Adding 4 more shadow casting lights for a total of 8 looks like this:

Fig 6. results of 8 filtered shadow casting lights

And below is 16 shadow casting lights:

Fig 7. results of 16 filtered shadow casting lights

## Reflecting on reflectance

Since our lighting is broken down into discrete directional components, it is very easy to slot in different surface reflectance functions also. Careful conservation of energy results in the following specular term.

*Fig 8. the specular term (with 16 filtered shadow casting lights)*

The Fresnel equations describe how to combine the diffuse and specular light terms together. More often than not, Schlick’s approximation to those equations are used. For a very readable treatment of this topic, I recommend Real-Time Rendering 3rd Edition. The result of combining the diffuse and specular, while still not quite there yet, gives us something in the ballpark of the target image. Both are shown below.

*Fig 9. Runtime rendering of the combined diffuse and specular terms*

* *

*Fig 10. pbrt target image (again)*

## A stepping off point

In the test above, we decomposed an environment map into a small number of directional lights, briefly looked at an approximation to visibility, and finally added some definition to the surfaces by adding a specular term. Looking a little deeper, each of those topics leads way to a whole separate world of adventure in itself (or complexity depending how much this type of stuff tickles your fancy). Much of the research done in rendering is really just trying to simplify these quantities into more tractable bite-sized chunks by making other simplifying assumptions along the lines of what we did here.

Each year a new slew of research is done into each of these topics. Papers and articles which use terms such as importance sampling, monte carlo integration, or probability density functions are likely trying to better answer our first question – ** how can we efficiently sample the incoming light? **Which naturally leads onto a second related question –

**For more on this question check out papers on spherical harmonics and irradiance volumes.**

*how can we efficiently store the incoming light?*

Any papers which deal with shadowing, occlusion, or acceleration structures, are likely trying to answer our next question - ** how can we efficiently approximate visibility?** Those of you who have worked with the rendering equation in the past will have no doubt seen that the actual equation itself doesn't contain a visibility term (area formulations do but that's a different context). This is just another one of those simplifying assumptions/tradeoffs made in the realtime world (and offline world also for that matter) to speed up rendering. It is a reasonable assumption to make if we only want to capture the direct lighting component since it generally ignores indirect light. When we do bypass the visibility approximation, we soon see that the whole rendering equation becomes recursive which is why we hear more advanced rendering algorithms limiting themselves to 1 or 2 recursions or bounces.

And finally, the question - ** what does the surface look like?** is dealt with in a field described by BRDFs, isotropic / anisotropic reflection, transmittance, Fresnel, and sub-surface scattering, to name a few.

So that's about it for this post. Thanks to Chester Hsieh and Jay Hsia for reading over it and providing feedback. An additional thanks to David Neubelt for providing further clarifications.