Posts by Niklas Frykholm
  1. What Is In a Name? ( Counting comments... )
  2. Scripted Network Debugging ( Counting comments... )
  3. Finding nearby stuff ( Counting comments... )
  4. Code Share: Source Censoring, Part 2 ( Counting comments... )
  5. What is gimbal lock and why do we still have to worry about it? ( Counting comments... )
  6. A Bug in Object Replication and Message Reordering ( Counting comments... )
  7. Why Lua? ( Counting comments... )
  8. Garbage Collection and Memory Allocation Sizes ( Counting comments... )
  9. Four meditations on bad design decisions ( Counting comments... )
  10. A Formal Language for Data Definitions ( Counting comments... )
  11. Bitsquid Foundation Library ( Counting comments... )
  12. A Data-Oriented, Data-Driven System for Vector Fields -- Part 3 ( Counting comments... )
  13. A Data-Oriented, Data-Driven System for Vector Fields -- Part 2 ( Counting comments... )
  14. A Data-Oriented, Data-Driven System for Vector Fields - Part 1 ( Counting comments... )
  15. A new way of organizing header files ( Counting comments... )
  16. Cleaning bad code ( Counting comments... )
  17. A simpler design for asynchronous APIs ( Counting comments... )
  18. Matrices, Rotation, Scale and Drifting ( Counting comments... )
  19. Hack Day Report ( Counting comments... )
  20. Read my lips: No more loading screens ( Counting comments... )
  21. Playing (with) Video ( Counting comments... )
  22. Embracing Dynamism ( Counting comments... )
  23. Inheriting Velocity in Ragdolls ( Counting comments... )
  24. PIMPL vs Pure Virtual Interfaces ( Counting comments... )
  25. Caring by Sharing: The Bitsquid Documentation System ( Counting comments... )
  26. Sensible Error Handling -- Part 3 ( Counting comments... )
  27. Sensible Error Handling - Part 2 ( Counting comments... )
  28. Sensible Error Handling: Part 1 ( Counting comments... )
  29. 5 Tips for Programmer Productivity ( Counting comments... )
  30. Platform Specific Resources ( Counting comments... )
  31. A Pragmatic Approach to Performance ( Counting comments... )
  32. Code Share: Source Censoring ( Counting comments... )
  33. An Example in Data-Oriented Design: Sound Parameters ( Counting comments... )
  34. Low-Level Animation -- Part 2 ( Counting comments... )
  35. Caring by Sharing: Header Hero ( Counting comments... )
  36. Managing Decoupling Part 4 -- The ID Lookup Table ( Counting comments... )
  37. A Simple Roll-Your-Own Documentation System ( Counting comments... )
  38. An idea for better watch windows ( Counting comments... )
  39. Fixing memory issues in Lua ( Counting comments... )
  40. An in-place parsing experiment ( Counting comments... )
  41. Lightweight Lua Bindings -- Part 2 ( Counting comments... )
  42. Lightweight Lua Bindings ( Counting comments... )
  43. Strings Redux ( Counting comments... )
  44. Monitoring your game ( Counting comments... )
  45. Write A Script For It ( Counting comments... )
  46. Universal Undo, Copy and Paste ( Counting comments... )
  47. Extreme Bug Hunting ( Counting comments... )
  48. Collaboration and Merging ( Counting comments... )
  49. A Tiny Expression Language ( Counting comments... )
  50. Managing Decoupling Part 3 - C++ Duck Typing ( Counting comments... )
  51. Managing Coupling Part 2 — Polling, Callbacks and Events ( Counting comments... )
  52. Managing Decoupling ( Counting comments... )
Technology/ Code /

Recently, I've become increasingly dissatisfied with the standard C++ way of organizing header files (one .h file and one .cpp file per class) and started experimenting with alternatives.

I have two main problems with the ways headers are usually organized.

First, it leads to long compile times, especially when templates and inline functions are used. Fundamental headers like array.h and vector3.h get included by a lot of other header files that need to use the types they define. These, in turn, get included by other files that need their types. Eventually you end up with a messy nest of header files that get included in a lot more translation units than necessary.

Sorting out such a mess once it has taken root can be surprisingly difficult. You remove an #include statement somewhere and are greeted by 50 compile errors. You have to fix these one by one by inserting missing #include statements and forward declarations. Then you notice that the Android release build is broken and needs additional fixes. This introduces a circular header dependency that needs to be resolved. Then it is on to the next #include line -- remove it, rinse and repeat. After a day of this mind-numbingly boring activity you might have reduced your compile time by four seconds. Hooray!

Compile times have an immediate and important effect on programmer productivity and through general bit rot they tend to grow over time. There are many things that can increase compile times, but relatively few forces that work in the opposite direction.

It would be a lot better if we could change the way we work with headers, so that we didn't get into this mess to begin with.

My second problem is more philosophical. The basic idea behind object-oriented design is that data and the functions that operate on it should be grouped together (in the same class, in the same file). This idea has some merits -- it makes it easier to verify that class constraints are not broken -- but it also leads to problems. Classes get coupled tightly with concepts that are not directly related to them -- for example things like serialization, endian-swapping, network synchronization and script access. This pollutes the class interface and makes reuse and refactoring harder.

Class interfaces also tend to grow indefinitely, because there is always "more useful stuff" that can be added. For example, a string class (one of my pet peeves) could be extended with functionality for tokenization, path manipulation, number parsing, etc. To prevent "class bloat", you could write this code as external functions instead, but this leads to a slightly strange situation where a class has some "canonized" members and some second-class citizens. It also means that the class must export enough information to allow any kind of external function to be written, which kind of breaks the whole encapsulation idea.

In my opinion, it is much cleaner to organize things by functionality than by type. Put the serialization code in one place, the path manipulation code in another place, etc.

My latest idea about organization is to put all type declarations for all structs and classes in a single file (say types.h):

struct Vector3 {
	float x, y, z;
};
 
template <class T>
class Array<T> {
public:
	Array() : _capacity(0), _size(0), _data(0) {}
	~Array() {free(_data);}
	unsigned _capacity;
	unsigned _size;
	T *_data;
};
 
class IFileSystem;
class INetwork;

Note that types.h has no function declarations, but it includes the full data specification of any struct or class that we want to use "by value". It also has forward declarations for classes that we want to use "by reference". (These classes are assumed to have pure virtual interfaces. They can only be created by factory functions.)

Since types.h only contains type definitions and not a ton of inline code, it ends up small and fast to compile, even if we put all our types there.

Since it contains all type definitions, it is usually the only file that needs to be included by external headers. This means we avoid the hairy problem with a big nest of headers that include other headers. We also don’t have to bother with inserting forward declarations in every header file, since the types we need are already forward declared for us in types.h.

We put the function declarations (along with any inline code) in the usual header files. So vector3.h would have things like:

inline Vector3 operator+(const Vector3 &a, const Vector3 &b)
{
	Vector3 res;
	res.x = a.x + b.x;
	res.y = a.y + b.y;
	res.z = a.z + b.z;
	return res;
}

.cpp files that wanted to use these operations would include vector3.h. But .h files and other .cpp files would not need to include the file. The file gets included where it is needed and not anywhere else.

Similarly, array.h would contain thinks like:

template <class T>
void push_back(Array<T> &a, const T &item)
{
	if (a._size + 1 > a._capacity)
		grow(a);
	a._data[a._size++] = item;
}

Note that types.h only contains the constructor and the destructor for Array<T>, not any other member functions.

Furthermore, I prefer to design classes so that the "zero-state" where all members are zeroed is always a valid empty state for the class. That way, the constructor becomes trivial, it just needs to zero all member variables. We can also construct arrays of objects with a simple memset().

If a class needs a more complicated empty state, then perhaps it should be an abstract interface-class instead of a value class.

For IFileSystem, file_system.h defines the virtual interface:

class IFileSystem
{
	virtual bool exists(const char *path) = 0;
	virtual IFile *open_read(const char *path) = 0;
	virtual IFile *open_write(const char *path) = 0;
	...
};
 
IFileSystem *make_file_system(const char *root);
void destroy_file_system(IFileSystem *fs);

Since the “open structs” in types.h can be accessed from anywhere, we can grop operations by what they do rather than by what types they operate on. For example, we can put all the serialization code in serialization.h and serialization.cpp. We can create a file path.h that provides path manipulation functions for strings.

An external project can also "extend" any of our classes by just writing new methods for it. These methods will have the same access to the Vector3 data and be called in exactly the same way as our built-in ones.

The main drawback of this model is that internal state is not as "protected" as in standard object-oriented design. External code can "break" our objects by manipulating members directly instead of using methods. For example, a stupid programmer might try to change the size of an array by manipulating the _size field directly, instead of using the resize() method.

Naming conventions can be used to mitigate this problem. In the example above, if a type is declared with class and the members are preceded by an underscore, the user should not manipulate them directly. If the type is declared as a struct, and the members do not start with an underscore, it is OK to manipulate them directly. Of course, a stupid programmer can still ignore this and go ahead and manipulate the members directly anyway. On the other hand, there is no end to the things a stupid programmer can do to destroy code. The best way to protect against stupid programmers is to not hire them.

I haven’t yet written anything really big in this style, but I've started to nudge some files in the Bitsquid codebase in this direction, and so far the experience has been positive.

This has also been posted to The Bitsquid blog.