you miss L1 so much that you cry yourself to sleep every night with a picture of it under your pillow. IBM changed the PS3s cache associativity section in the HW manual to a disclaimer: It reads "no-way associated with your code". Miss Teen S. Carolina saw your stl container overusage and thought it could help countries like The Iraq that dont have maps. you spill so much that even BP is glad they're not you. I heard your render func is in a contest with Gran Turismo 5 to see which is finished first. your shader only does one lookup... and its the phone number for the suicide prevention hotline. your code is so overengineered, that even Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides cant understand it. Your DMA transfers have so many stalls, I thought your MFC was having a matsuri. mike action realized Insomniac can greatly improve graphics AND go back to 60fps just by firing you. Oh, hai! I rewrote your code to use linked lists. It reduced cache misses by 75%. The Mayans looked into the future, saw 2012 lines of your code, and were convinced humanity is screwed. Speaking of SPURS, you have so few jobs scheduled that the US government had to adjust their unemployment numbers. I heard your cache just replaced Linfen, China as the most polluted place on Earth! If we ever lose all the Amazon rainforest trees, we could replace them with your code. They have the same number of branches.
Posts by Jaymin Kessler
  1. Strattonian Gambit ( Counting comments... )
  2. I HAVE CURED INSOMNIA (with this video on vector-scalar mixing) ( Counting comments... )
  3. Basic register allocation, for n00bs by n00bs ( Counting comments... )
  4. PixelJunk Shooter 2 SIGGRAPH talk ( Counting comments... )
  5. KHAAAAAAAAAAAAAAAAN (academy) ( Counting comments... )
  6. When you can’t SPASM, TREMBLE ( Counting comments... )
  7. PixelJunk Shooter 2 lighting : My one (so far) regret ( Counting comments... )
  8. Seven Year Review ( Counting comments... )
  9. Initial D(ebugger) 3rd Stage ( Counting comments... )
  10. simple SPU ray length counting trick that everyone probably already knew ( Counting comments... )
  11. Rate Me, My Friend ( Counting comments... )
  12. The Radical Optimizationist’s War on Abstraction and Patterns ( Counting comments... )
  13. Software Pipelining (failed video experiment) ( Counting comments... )
  14. Put This In Your Pipe And Execute It ( Counting comments... )
  15. On Demos and Programming Tests : Rant From A Q-Games Test Reviewer ( Counting comments... )
  16. Vectiquette ( Counting comments... )
  17. Looks like I’m up ( Counting comments... )
Technology/ Code /
So, I couldn’t think of anything cool and advanced (that isn’t covered by my NDA) to talk about on such short notice, so I figured I’d start with something easy.  My apologies to all the veterans on the list since its basic stuff you already know.

 

We all know that when it comes to programming, there aren’t many language types as fun and exciting as assembly.  Unfortunately, in this crazy world of power lunches and tight deadlines, we don’t really get as many chances to write in assembly as we’d like to.  However, being able to read and understand that alien language in your debugger’s disassembly tab is something that every programmer needs to be able to do.  It’s essential for debugging crashes that occur in release builds only, diagnosing optimization-related compiler bugs, and better understanding what the compiler is thinking so you can make more informed optimization decisions.

 

Since all non-handheld gaming platforms are based around PowerPC, I’ll be focusing on that.  Maybe I’ll update this to include ARM/NEON or MIPS someday.  VFPU would be awesome but I’m not sure if that’s supposed to be secret (can anyone verify?)

 

Basic Calling Convention

 

The first thing you can do is familiarize yourself with the PowerPC calling convention and ABI.  If you know the calling convention and some basic instructions, you can extract almost any information you need. 

 

Lets start with something simple that comes up often.  You step into a function and want to see the arguments that are passed in to the function.  Mousing over the variable either gives you something like 0xFFFFFFFF or no value at all.  What can you do?  Well, lets see what the generated code looks like for an update function:

 

 

void Doooooosh( Bag* bag,  DooooooshLevel level)
{  
mflr             r12  
bl               __savegprlr + 0034h (82eea8c4h)  
stfd             fr31,-38h(r1)  
stwu             r1,-90h(r1)  
mr               r31,r3  
mr               r30,r4

 

There are a few things you need to know about the PowerPC calling convention.

 

1) for non-member functions or static member functions, small non-float arguments ( int, bool, pointers, etc... ) are passed in as r3 through r10

 

2) for C++ member functions, r3 is always the this pointer, and function arguments are passed in as r4 through r10

 

3) more often than not, float arguments are passed in using the floating point registers ( fr1, for example )

 

So, knowing this is a C style standalone function, all you have to do is set a breakpoint early in the function and look at r3 and r4.  Later on you'll see why it has to be early.  To get the real values, all we have to do is open up a watch and cast each register to its expected type:

 

( Bag * )r3
( DooooooshLevel )r4

 

When working with a C++ member function we'd use r4 and r5 instead, and we could also get the this pointer using:

 

( SomeClassName * )r3

 

As a side note, if you're wondering how the proper values end up in the right registers to make a function call, its set up like this:

 

Doooooosh( bag, level );
mr               r4,r30
mr               r3,r31
bl               Doooooosh (8293d0e8h)

 

mr is the "move register" instruction.  In the above example, mr  r4, r30 will take the contents of r30 and copy it to r4.  We must assume that r31 and r30 contain the bag pointer and level respectively.  Since all C function calls expect their arguments to be in r3 and up, we have to copy all our arguments to those registers.  bl stands for "branch link" and is how we usually call non-leaf functions.

 

Now there is a catch.  Remember when I said we have to look at these registers early in the function?  At the very beginning of Doooooosh( ) we can assume that the bag pointer and level will be in r3 and r4 respectively.  Thats just how function arguments are passed in.  But what if Doooooosh( ) calls another function?  Wont that called function also need its argument in r3?  The point is that just because your function arguments are originally in r3 and r4 doesn’t mean you can expect them to stay there for long.  Taking a look back at the original example, you'll see  

 

mr               r31,r3  
mr               r30,r4

 

Basically, this is the code saying "I understand that r3 is probably going to get overwritten very soon so I'm going to  back up its value in r31".  Any time after these two register moves are executed, we can now get the function arguments ( more safely ) like this:

 

( Bag * )r31
( DooooooshLevel )r30

 

Remember, on the PS3 and Xenon, r3 through r10 are considered volatile and r14 through r31 are considered general use non-volatile.  Non volatile means that if you stick a value in r30 and then make a function call, when that function call returns r30 will be just as you left it.  That is why at the beginning of Doooooosh( ) we save all the argument registers ( r3 and r4 ) into safe non-volatile registers ( r31 and r30 ) 

 

Some More Debugging Tips

 

Don’t be afraid to go back in the call stack if the info you need can’t be found by the above method.  For example, I wanted to examine a string that only existed in a function earlier up in the call stack.  The solution was to go up one call in the call stack and look for the bl function call.  A few lines above that, we were copying the function argument from r30 to r4 ( like we always do for function arguments ).  I moused over r30, casted it to a char *,  and that gave me the string.  Remember that this usually only works for non-volatile registers r14 to r31 ( this is because the registers are “spilled” or copied into the stack frame.  Visual Studio and SN debugger are usually able to look in the stack frame to retrieve the saved register values. )

 

Getting local variables stored in registers can be a little tricky.  While I don’t think there is any one way that works 100% of the time, there are a couple of tricks you can use that may help you through.  I'm sure with a little imagination, you'll figure it out

 

1) If the local variable is passed as the first argument of the function, look for it in the corresponding register right before the function is called ( r3 for a C function or r4 for a C++ member function ) before a function call ( bl ).  If you need to catch it a little earlier, start at the function call and work backwards.  If you know that the value in r31 is moved into r3 right before the function call, then work your way up the code and see where r31 is being set.  The lesson is don’t be afraid to work backwards.

 

2) look for landmarks.  Often, the generated assembly wont match the code very well.  Sometimes in mixed view, you'll have what looks like 10 lines of perfectly good C++ code that seem to have no assembly code generated.  Thats when landmarks come in handy.  If you have something like this

 

float x;
x = sqrt( y );

 

manually scan through the function and look for some assembly opcode that looks like it could correspond to a floating point square root.  From there, you can see what the code does with the result and better trace through the assembly.  Some other good landmarks include incrementing, trig functions, floating point multiplies, loop conditions, NULL checks, and any other function that would have some stand out opcodes.

 

3) look for constant initializers.  If you have something like this

 

int x = 123;

 

and you see some assembly in the function that looks like

 

li r30, 123

 

You may have found a hint that r30 corresponds to x at this point in time.  By the way, in case you didn’t already know, li stands for "load immediate" and it loads an immediate value into a register.  Note that you can only load 16 bit constants in this way.  32 bit constants are done in two instructions by loading the lower 16, then loading the upper 16 and shifting left.

 

4) if the local variable is used in a conditional, see what is being compared.  Compares look something like this

 

if( player_controller < 8 )  
cmpwi            cr6,r3,8  
bge              cr6, CPlusPlusSucks::AndSoDoesThisFunc + 0064h (8283457ch)

 

most compare instructions begin with cmp.  Here you are comparing r3 with 8 and setting some result flags in cr6.  bge means branch greater than or equal.  It checks the cr6 result flags that were set by the compare and then branches if appropriate.  The point is that we know for sure that at this point r3 is player_controller.  If needed we can work our way backwards and look for useful information.

 

Stack Frame: When When All Else Fails

 

The above diagram is what the stack frame could look like on Xenon.  If there is some weird bug you have to track down and all else fails, including good old fashioned logical thinking about the problem at a higher level, you can draw out one of these stack diagrams and extract more information than you could get using some of the previous techniques.  

 

PPC updates its stack all at once at the beginning of the function, unlike LoseTel which seems to do it as you pop and push.  The code will look something like this

 

stwu   r1, -96(r1)

 

Obviously r1 is the stack pointer, and stwu is a clever way of telling people to shut up.  Errr... I mean its “store word and update.” It atomically stores r1 at the address and then updates r1 with the new address.  The update direction is negative because the stack grows towards low memory.  Since the caller’s SP is saved  exactly at the top of the new stack frame, this is exactly what we want.

 

This can get you a few things.  First, it enables you to get a call stack in some cases where the debugger goes nuts.  It allows you to get the value of params that are too big or too numerous to pass in registers.  It also leads to your religious coworkers calling you a witch and trying to burn you for your black magic.

 

Here is a quick way to decipher instructions you may not know:

 

if it starts with "L", it’s probably a load

 

if it starts with "S", its probably a store ( instructions starting with "sl" and "sr" are bitshifting operations )

 

if it starts with "F", it’s probably a floating point math instruction

 

if it starts with a "B" it’s a branch.

 

if it has an "i" at the end of it, it's probably taking input from an immediate rather than a register.

 

    Thats the very basics.  Hopefully that should be enough to get you started reading and understanding your code’s disassembly.  Real understanding only comes with practice, so when you have free time (during rebuilds?), look at random bits of code in optimized and unoptimized builds and see how they differ.  Don’t just look at the code and see a bunch of instructions, one of which may or may not be a bl with a function name.  Instead, try to really understand every instruction and what the code is doing.  Its not easy, but someday you’ll be a hero to your unenlightened coworkers who truly believe that optimized builds can not be debugged by humans.

 

Love,
    Jaymin Kessler