Posts by Niklas Frykholm
  1. What Is In a Name? ( Counting comments... )
  2. Scripted Network Debugging ( Counting comments... )
  3. Finding nearby stuff ( Counting comments... )
  4. Code Share: Source Censoring, Part 2 ( Counting comments... )
  5. What is gimbal lock and why do we still have to worry about it? ( Counting comments... )
  6. A Bug in Object Replication and Message Reordering ( Counting comments... )
  7. Why Lua? ( Counting comments... )
  8. Garbage Collection and Memory Allocation Sizes ( Counting comments... )
  9. Four meditations on bad design decisions ( Counting comments... )
  10. A Formal Language for Data Definitions ( Counting comments... )
  11. Bitsquid Foundation Library ( Counting comments... )
  12. A Data-Oriented, Data-Driven System for Vector Fields -- Part 3 ( Counting comments... )
  13. A Data-Oriented, Data-Driven System for Vector Fields -- Part 2 ( Counting comments... )
  14. A Data-Oriented, Data-Driven System for Vector Fields - Part 1 ( Counting comments... )
  15. A new way of organizing header files ( Counting comments... )
  16. Cleaning bad code ( Counting comments... )
  17. A simpler design for asynchronous APIs ( Counting comments... )
  18. Matrices, Rotation, Scale and Drifting ( Counting comments... )
  19. Hack Day Report ( Counting comments... )
  20. Read my lips: No more loading screens ( Counting comments... )
  21. Playing (with) Video ( Counting comments... )
  22. Embracing Dynamism ( Counting comments... )
  23. Inheriting Velocity in Ragdolls ( Counting comments... )
  24. PIMPL vs Pure Virtual Interfaces ( Counting comments... )
  25. Caring by Sharing: The Bitsquid Documentation System ( Counting comments... )
  26. Sensible Error Handling -- Part 3 ( Counting comments... )
  27. Sensible Error Handling - Part 2 ( Counting comments... )
  28. Sensible Error Handling: Part 1 ( Counting comments... )
  29. 5 Tips for Programmer Productivity ( Counting comments... )
  30. Platform Specific Resources ( Counting comments... )
  31. A Pragmatic Approach to Performance ( Counting comments... )
  32. Code Share: Source Censoring ( Counting comments... )
  33. An Example in Data-Oriented Design: Sound Parameters ( Counting comments... )
  34. Low-Level Animation -- Part 2 ( Counting comments... )
  35. Caring by Sharing: Header Hero ( Counting comments... )
  36. Managing Decoupling Part 4 -- The ID Lookup Table ( Counting comments... )
  37. A Simple Roll-Your-Own Documentation System ( Counting comments... )
  38. An idea for better watch windows ( Counting comments... )
  39. Fixing memory issues in Lua ( Counting comments... )
  40. An in-place parsing experiment ( Counting comments... )
  41. Lightweight Lua Bindings -- Part 2 ( Counting comments... )
  42. Lightweight Lua Bindings ( Counting comments... )
  43. Strings Redux ( Counting comments... )
  44. Monitoring your game ( Counting comments... )
  45. Write A Script For It ( Counting comments... )
  46. Universal Undo, Copy and Paste ( Counting comments... )
  47. Extreme Bug Hunting ( Counting comments... )
  48. Collaboration and Merging ( Counting comments... )
  49. A Tiny Expression Language ( Counting comments... )
  50. Managing Decoupling Part 3 - C++ Duck Typing ( Counting comments... )
  51. Managing Coupling Part 2 — Polling, Callbacks and Events ( Counting comments... )
  52. Managing Decoupling ( Counting comments... )
Technology/ Code /

I like to roll my own documentation systems. There, I’ve said it. Not for inline documentation, mind you. For that there is Doxygen and for that I am grateful. Because while I love coding, there is fun coding and not-so-fun coding, and writing C++ parsers tends to fall in the latter category.

So for inline documentation I use Doxygen, but for everything else, I roll my own. Why?

I don’t want to use Word or Pages or any other word processing program because I want my documents to be plain text that can be diffed and merged when necessary. And I want to be able to output it as clean HTML or in any other format I may like.

I don’t want to use HTML or LaTeX or any other presentation-oriented language, because I want to be able to massage the content in various ways before presenting it. Reordering it, adding an index or a glossary, removing deprecated parts, etc. Also, writing <p> gets boring very quickly.

I don’t want to use a Wiki, because I want to check in my documents together with the code, so that code versions and document versions match in the repository. I definitely don’t want to manage five different Wikis, corresponding to different engine release versions. Also, Wiki markup languages tend to be verbose and obtuse.

I could use an existing markup language, such as DocBook, Markdown or ReStructured Text. But all of them contain lots of stuff that I don’t need and lack some stuff that I do need. For example I want to include snippets of syntax highlighted Lua code, margin notes and math formulas. And I want to do it in a way that is easy to read and easy to write. Because I want there to be as few things as possible standing in the way of writing good documentation.

So I roll my own. But as you will see, it is not that much work.

I’ve written a fair number of markup systems over the years (perhaps one too many, but hey, that is how you learn) and I’ve settled on a pretty minimalistic structure that can be implemented in a few hundred lines of Ruby. In general, I tend to favor simple minimalistic systems over big frameworks that try to ”cover everything”. Covering everything is usually impossible and when you discover that you need new functionality, the lightweight systems are a lot easier to extend than the behemoths.

There are two basic components to the system. Always two there are, a parser and a generator. The parser reads the source document and converts it to some kind of structured representation. The generator takes the structured representation and converts it to an output format. Here I’ll only consider HTML, because to me that is the only output format that really matters.

To have something concrete to talk about, let’s use this source document, written in a syntax that I just made up:

@h1 Flavors of ice cream
 
My favorite ice cream flavors are:
 
@li Strawberry
@li Seagull

The Parser

The most crucial point of the system is what the structured representation should look like. How should the parser communicate with the generator? My minimalistic solution is to just let the representation be a list of lines, with each line consisting of a type marker and some text.

(:h1, ”Flavors of...”)
(:empty, ””)
(:text, ”My favorite...”)
(:empty, ””)
(:li, ”Strawberry”)
(:li, ”Seagull”)

To some this will probably seem like complete heresy. Surely I need some kind of hierarchical representation. How can I otherwise represent things like a list-in-a-list-in-a-cat-in-a-hat?

No problem, to represent a list item nested in another list, I just use a @li_li tag and a corresponding :li_li type marker. If someone wants three or more levels of nesting I suggest that they rewrite their document. This is supposed to be readable documentation, not Tractatus Logico-Philosophicus. I simply don’t think that deep nesting is important enough to warrant a complicated hierarchical design. As I said, I prefer the simple things in life.

So, now that we know the output format, we can write the parser in under 20 lines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Parser
  attr_reader :lines
 
  def initialize()
    @lines = []
  end
 
  def parse(line)
    case line
    when /^$/
      @lines << {:type => :empty, :line => ""}
    when /@(\S+)\s+(.*)$/
      @lines << {:type => $1.intern, :line => $2}
    when /^(.*)$/
      @lines << {:type => :text, :line => line}
    end
  end
end

Of course you can go a lot fancier with the parser than this. For example, you can make a more Markdown-like syntax where you create lists by just starting lines with bullet points. But this doesn’t really change the basic structure, you just need to add more whens in your case-statement.

One useful approach, as you make more advanced parsers, is to have markers that put the parser in a particular state. For example, you could have a marker @lua that made the parser consider all the lines following it to be of type :lua until the marker @endlua was reached.

The Generator

A useful trick when writing HTML generators is to always keep track of the HTML tags that you have currently opened. This lets you write a method context(tags) which takes a list of tags as arguments and closes and opens tags so that exactly the tags specified in the list are open.

With such a method available, it is simple to write the code for outputting tags:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Generator
  def h1(line)
    context(%W(h1 #{"a name=\"#{line}\""}))
    print line
  end
 
  def text(line)
    context(%w(p))
    print line
  end
 
  def empty(line)
    context(%w())
    print line
  end
 
  def li(line)
    context(%w(ul li))
    print line
    context(%w(ul))
  end
end

Notice how this works. The li() method makes sure that we are in a <ul> <li> context, so it closes all other open tags and opens the right ones. Then, after printing its content, it says that the context should just be <ul> which forces the closure of the <li> tag. If we wanted to support the :li_li tag, mentioned above, we could write it simply as:

1
2
3
4
5
6
7
class Generator
  def li_li(line)
    context(%w(ul li ul li))
    print line
    context(%w(ul li ul))
  end
end

Notice also that this approach allows us to just step through the lines in the data structure and print them. We don’t have to look back and forward in the data structure to find out where a <ul> should begin and end.

The rest of the Generator class implements the context() function and handles indentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Generator
  def initialize()
    @out = ""
    @context = []
    @indent = 0
  end
 
  def print(s)
    @out << ("  " * @indent) << s << "\n"
  end
 
  def open(ci)
    print "<#{ci}>"
    @indent += 1
  end
 
  def close(ci)
    @indent -= 1
    print "</#{ci[/^\S*/]}>"
  end
 
  def context(c)
    i = 0
    while @context[i] != nil && @context[i] == c[i]
      i += 1
    end
    while @context.size > i
      close(@context.last)
      @context.pop
    end
    while c.size > @context.size
      @context.push( c[@context.size] )
      open(@context.last)
    end
  end
 
  def format(lines)
    lines.each {|line| self.send(line[:type], line[:line])
    context(%w())
    return @out
  end
end

Used as:

1
2
3
parser = Parser.new
text.each_line {|line| parser.parse(line)}
puts Generator.new.format(parser.lines)

So there you have it, the start of a custom documentation system, easy to extend with new tags in under 100 lines of Ruby code.

There are some things I haven’t touched on here, like TOC generation or inline formatting (bold and emphasized text). But it is easy to write them as extensions of this basic system. For example, the TOC could be generated with an additional pass over the structured data. If there is enough interest I could show an example in a follow-up post.

(This has also been posted to the BitSquid blog. Check out our new website.)