Source Indexing is Underused Awesomeness
If you’ve ever had to debug code that was not built on your machine – whether looking at crash dumps or debugging live code – then you need source indexing.
If you’ve ever wasted time trying to find the source file (or the right version of the source file) used to build a DLL or EXE you are debugging then you need source indexing.
Source indexing (also known as source server) is free, fast, available since Visual Studio 2005, easy to use, and it ensures that the correct source file will always appear in your debugger, whether you’re debugging a crash from yesterday or yesteryear.
The problem is that most people don’t know that source indexing exists. In this post I’m going to explain why you need it, what it is, and how to use it for C++ development.
The length of this article makes source indexing look more complicated than it really is. Here’s the short version.
On your build machine you need to:
- Modify srcsrv\srcsrv.ini (from the debugging tools) so that MYSERVER points at your Perforce server
- run p4index.cmd to embed source indexing information in your PDBs
On your development machines you need to:
- Click the check box to enable source server in your debugger
- Create a srcrv.ini file (in your Visual Studio install directory, in “common7\ide”) to tell VS to stop popping up security warnings
That’s it. Read on for the details.
The need for source indexing
When you step through code in your debugger – whether it’s Visual Studio or windbg – you probably want to have the debugger show you your source code. Even if you are stepping through assembly language instructions it is foolishly inefficient to not have the source code right there.
If you built all of the binaries on your machine then it is straightforward for the debugger to retrieve the correct source files. When VC++ builds your code it embeds the full paths to all of the source files in the PDB file. It also embeds records that associate ranges of instructions with a particular line in a source file. In optimized code this mapping is imperfect, but it gets you to the right area.
If you are debugging locally built binaries and you have modified one of the source files since doing the build then Visual Studio will detect this (by comparing file signatures) and warn you of the potential problem. So far so good.
If you are debugging binaries from your build machine (you do have a build machine don’t you?), especially if those binaries are from a few days or weeks ago, then this system doesn’t work. The first problem is that the source file paths may not match, since the build machine may have a different directory structure for its enlistment. This can be addressed by manually locating the files on your machine, but that extra step is inconvenient.
The larger problem is that the version of the source files used to build the binaries may not match what is on your machine. Visual Studio will warn you that the files don’t match, but then you need to play a tedious game of find-the-version as you try to sync to the correct version of the source file in the correct branch.
Wouldn’t it be nice if as you stepped through code the debugger would just automatically retrieve the correct source files?
What is source indexing?
What we need is a way to embed some extra information in the PDB file. Assuming that we are using Perforce for our version control (I’m a huge fan and it’s free for limited use, so try it) then we can uniquely identify each checked in file by its Perforce path and version number. As an example, on my local machine I have this source file:
This file exists on all clients of my Perforce database, but the path may be different on each client. However using the “p4 have” command I can get the depot path, which is:
The depot path is not only a universal identifier of which file we are talking about it also contains the version number of the file. Thus, if we embed the path above in a PDB file, and associate it with the existing file path which is already associated with blocks of instructions then we have all of the information needed to retrieve the correct version of the correct file when debugging.
That is exactly what source indexing does. It is a simple and efficient process that embeds version-control path and version-number information into the PDB. Both Visual Studio and windbg support this data and can use this information to automatically retrieve the necessary source files.
Source indexing is also supported for other version control systems (Source Safe, TFS, and CVS) and can be extended to support arbitrary version control systems. However, since my experience is exclusively with Perforce that is all I will discuss.
Running source indexing on your build machine
The source indexing tools and documentation are installed with the Windows debuggers, which come with the Windows SDK. The default path, for a 64-bit install, is:
c:\Program Files\Debugging Tools for Windows (x64)\srcsrv
You should probably copy this directory and check it in so that any modifications that you make are preserved. You might also want to look at srcsrv.doc since it gives lots of extra information about using source indexing.
The logic of source indexing is written in Perl so you’ll need to install that in order to use source indexing.
You then need to modify srcsrv\srcsrv.ini so that the MYSERVER variable contains the name of your Perforce server. You can just copy the server address from the output of “p4 info”. In my case the variable in srcsrv.ini looks like this:
To do source indexing with Perforce you will need to run p4index.cmd. Running it with -? gives help on the command line options. The only ones I use are source, symbols, and debug. The source option is used to point at the enlistment containing your source code, the symbols option is used to point at a directory which contains (recursively) all of your PDB files, and the debug option just tells it to list all of the PDBs as it indexes them.
Because source indexing records the current version numbers of source files it will only record useful results if you build from checked-in source files. Source indexing isn’t storing your source files – it’s just storing their paths and versions – so any modifications in checked out files will not be recorded. That’s why source indexing only makes sense on a build machine (or build enlistment) which by its very nature is building from the checked in files. The build machine doesn’t need to be synced to latest – the “p4 have” command returns the currently synced version not the latest version – but each source file has to be synced to some version.
My build batch file contains the following three lines to implement source indexing:
call p4index -source=%builddepot% -symbols=%builddepot%\output -debug
That’s it. At this point the source indexing information is embedded in my PDB files (using a tiny amount of space) and I archive them just like normal, in my case using symbol server.
Using source indexing in your debugger
While source indexing is the process of putting version control information into your symbol files, source server is the set of tools that extract that information and get the correct source file for use in the debugger.
Because source indexing is sometimes known as source server it makes people think that it is related to symbol servers. It’s not. Symbol servers (perhaps the subject of another post) are for helping the debugger automatically find the right symbol files. Source indexing is for finding the right source files. Since the source indexing information is stored in the symbol (PDB) files the first step is to make sure that you have symbols loaded. Retrieving those symbol files from a symbol server is certainly a good idea, but is orthogonal to the topic of retrieving source files once you have the symbols.
The next step depends on what debugger you are using. If you are using windbg then simply type “.srcfix” into the command window. Source server is now enabled and source files will be automatically retrieved as you step through code or navigate the call stack. You will get this security alert because in order for source indexing to retrieve the file it must run the command specified in the PDB file. This is a security risk if somebody gives you a malicious PDB file so you should examine the command to make sure it isn’t running an executable that could be exploited. Then you should probably tell it to not ask you every time.
In Visual Studio there is a checkbox to enable source server. Go to Tools-> Options-> Debugging-> General and check “Enable source server support”. You should probably also check “Print server diagnostics to the Output window” to help diagnose any problems that might occur. As with windbg there is a security warning that will come up:
Unfortunately it lacks a “stop asking me” button so, as John Robbins said, you will soon want to fly to Redmond to punish someone. This is an annoyance and a security flaw because this dialog quickly trains you to click “Run” without reading the message. The non-obvious solution to this is to create a srcsrv.ini file in “common7\ide” in your Visual Studio install directory and mark p4.exe as being a trusted command, like this:
For greater security you can specify the path to the trusted command, like this:
After creating the srcsrv.ini file you need to stop debugging and then resume debugging before Visual Studio will notice – proof positive that windbg has the superior debugging user interface.
Visual Studio will save the retrieved source files in your personal AppData folder, whereas windbg saves them in the shared ProgramData folder. The important factor is that they are extracted to a location far away from your enlistment – debugging an old executable doesn’t require syncing your enlistment to old files.
There are two basic stages to source indexing. First the Perl script does a “p4 have ...” in the specified source directory, to get a list of all of the source files. This tends to be pretty fast. As long as you have specified the correct directory this stage should work correctly. Look at the “Source root” output to make sure it is where you think it is, or hack the Perl script to print some of the data retrieved.
The next stage the Perl script does is to recursively look for all PDB files in the specified symbols directory. The Perl script runs “srctool -r” on each PDB to get a list of source files, and then looks up each file in the information returned by the p4 have command. If it finds a match then it stores the Perforce path and version information in a special block of data. It is only the PDBs for EXEs and DLLs that are indexed, and any vc100.pdb files will not be indexed.
Typical output for source indexing with the -debug option looks like this:
ssindex.cmd [STATUS] : Server ini file: c:\builddepot\srcsrv\srcsrv.ini
ssindex.cmd [STATUS] : Source root : c:\builddepot\Source
ssindex.cmd [STATUS] : Symbols root : c:\builddepot\Source
ssindex.cmd [STATUS] : Control system : P4
ssindex.cmd [STATUS] : P4 program name: p4.exe
ssindex.cmd [STATUS] : P4 Label : <N/A>
ssindex.cmd [STATUS] : Old path root : <N/A>
ssindex.cmd [STATUS] : New path root : <N/A>
ssindex.cmd [STATUS] : Partial match : Not enabled
ssindex.cmd [STATUS] : Running... this will take some time...
wrote C:\...\index453A.stream to c:\builddepot\plugins\AutoQuad.pdb ...
wrote C:\...\indexD346.stream to c:\builddepot\plugins\Mandelbrot.pdb ...
zero source files found ...
zero source files found ...
The cost to having too inclusive a source directory is that “p4 have ...” will have to retrieve more file names, but this is very cheap. The cost to having too inclusive a symbols directory is that the “dir *.pdb /s” phase will take longer and the process will try to source index more PDBs. In both cases it is usually best to err on the side of more, and only specify a more restrictive directory set if (typically for symbols) performance requires it. In many cases source indexing completes in less than ten seconds and performance is not an issue.
Note that the vc100.pdb files, which are intermediate PDB files, don’t have source information so the “zero source files found …” message is normal and expected.
You can troubleshoot particular PDB files using the “pdbstr” command to view the raw source indexing stream. An example command would be:
pdbstr -r -s:srcsrv -p:outputfile.pdb
Typical output looks like this:
SRCSRV: ini ------------------------------------------------
DATETIME=Fri Oct 28 17:48:53 2011
SRCSRV: variables ------------------------------------------
P4_EXTRACT_CMD=p4.exe -p %fnvar%(%var2%) print -o %srcsrvtrg% -q "//%var3%#%var4%"
SRCSRV: source files ---------------------------------------
Some variables are set up to define what command line will be executed (P4_EXTRACT_CMD) in order to retrieve files from version control (the debugger actually executes p4.exe) and then the “source files” section is a mapping of file system paths to perforce paths and version numbers.
You can also use srctool.exe to examine how source indexing is working. Some useful options are:
- srctool -c: summarizes how many files were indexed and how many weren’t. Note that a lot of the files used to build your code are actually operating system and compiler source files which is why many are not indexed
- srctool -u: lists all of the source files that were not indexed – look here to see if any of your files were missed
- srctool -r: lists all of the source files listed in the PDB.
You can also troubleshoot source indexing by modifying the Perl script to print out additional information, such as how many source file names were retrieved from Perforce.
Note that if you generate source files as part of your build process and don’t check them in then they will not be indexed, because they won’t be part of the “p4 have…” command. Source files that are copied to a new location before they are compiled also won’t be indexed.
What about static libraries?
Source indexing embeds version control information in PDB files, and .lib files don’t really have PDB files – so what do you do? If the .lib files are built by your build machine and then linked into DLLs and EXEs then you don’t need to do anything. When you run source indexing on the DLLs and EXEs then the source files used to build the contained libraries will also be indexed. As long as those libraries were built from the source files that your build machine is currently synced to (which should be the case, unless you are doing odd procedures) then all will be well.
Source indexing is easy to set up, costs little to run, and can have persistent benefits for years to come. There is really no excuse not to use it. Some days I make no use of source indexing, and other days it saves me from wasting time trying to track down dozens of source files.
Mozilla has been using source indexing since 2008 and there is lots more information available, such as this article on source Server and Symbol Server Support in TFS 2010.