Aurora: Stupid C++ Tricks

I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone.
-- Bjarne Stroustrup

Introduction

A while ago Charles Nicholson wrote an article about how they handled asserts in their engine. I kind of liked the title "Stupid C++ Tricks" and it triggered this article idea. Then since of course Noel Llopis managed to beat me to it, that after I've given him a hard time about his infrequent posts :) So without dwelling too much on why there are so many books and articles about traps, pitfalls, stupidity and secrets regarding C and C++ here you go; some more stupid tricks, don't try this at home.

The Pimpl idiom

One of the early books I bought on C++ was James Coplien's 'Acid Book' (as Meyers calls it). Much of the stuff in there is today more bread and butter things, although it you haven't read it, you should. One of the things James (or Jim, how nice of a name is that) introduced was the Pimpl idom. Private Implementation is a happy interpretation of the weird name, the more plausible is pointer to implementation. In simple terms it's a compiler firewall, or an opaque type that effectively hides the implementation of any class from the outside.

How does this work in practice? Well, you simply abuse the forward declaration mechanism a little:

// in the header
class Foo
{
public:
	Foo();
	~Foo();

private:
	struct Pimpl; // forward declaration to internal structure
	Pimpl* m; // opaque pointer to actual data
};

// in the cpp file
struct Foo::Pimpl
{
	std::string name;
};

Foo::Foo()
	: m( new Pimpl)
{
}

Foo::~Foo()
{
	delete m;
}

Of course the above implementation has all sorts of problems, like what happens during assignment? Or copy construction? Depending on how heavy your structure is, this can take quite some time to copy around. Also consider that you've split the memory allocation between two places potentially. Luckily, in most cases you will fairly heavy classes that need this kind of idiom, perhaps even singleton classes where construction and destruction isn't really an issue since it happens once during a program. You can of course easily protect the copy/assignment for those kind of classes.

But what if we want to have a fast, local and opaque type? Oh, you mean eat and have the cake at the same time? Yes, since we're programmers, that's always on our mind (who said you can't do this?). With some care you can actually just declare the memory right in the class and use it like this:

class Foo
{
public:
  Foo();
 ~Foo();

private:
  char mMemory[128 - sizeof(void*)];
  struct Pimpl;
  Pimpl& m;
};

struct Foo::Pimpl
{
};

Foo::Foo()
	: m( *new(mMemory) Pimpl )
{
  static_assert(sizeof(Pimpl) <= sizeof(mMemory), "Not enough memory for the Pimpl");
}

Foo::~Foo()
{
  m.~Pimpl();
}

Whoa! Ok. So there are some really crazy things going on here, first I've used the upcoming static_assert construct. You can feel free to use anything else just as long as you catch when the memory size declared in Foo is too small at compile time. A silly simple static assert can be written like this:

	#define STATIC_ASSERT(expr) typedef char(&foobar)[ (expr) ? 1 : 0 ]

Another thing I've done is binding a reference to the char buffer as a shorthand to the pimpl. It's a reference merely because a single dot is half the typing of the arrow needed for a pointer. I could have in each of the member functions just implicitly recreated the pointer to the Pimpl, but I'm lazy. When pinched for space you could remove the reference though. The sum of the parts is that we can now instantiate Foo on the stack, have no dynamic memory allocations and still don't know anything about the internals of the class.

So the big wins of using the slightly convoluted way of declaring memory and members is that we completely hide how the class is implemented. Thus we can update the code without the clients being any wiser, or even needing to know. This translates directly to less rebuilds of the code, exportable classes across DLL's and less pollution of namespaces. The last thing is actually pretty important, this will let us use different compiler settings on the translation unit itself, without the other units caring much easier. Why does this matter? Well, there is one big fat ugly elephant header, windows.h, that lurks in the corner. That particular header is not ANSI (nor any other standard) compliant, it needs a host of microsoft specific extensions even to compile. Plus it leaks numerous macros etc outside itself. In short, you don't want to pull this header into one of your core headers if you can help it. Plus it takes forever to parse.

Using pimpl, you can enable the language extensions on only those few files that actually needs to pull in windows.h to work, the vast majority of the other files should be cross platform and standard compliant (hey, you've written your own code according to the standard, right?). If never leaving MSVC is a reality for you, then this might not be so important, but for the rest of us, different compilers will act differently for code that doesn't follow the standard. And I hate that, since it means that you have to track down each and every occurrence where the code is bad.

Reusing variable names

There are some variable names that are sacred. i,j and k are automatically for any mathematician index variables. They can be nothing else (and sometimes consecutive letters with or without subscripts are conscripted into service). x,y,z and w are all related to homogenous vectors. Now, for non games programmers this might not be true, although I suspect that the variable i still does some heavy duty as an index variable in for loops.

One of the more common things you do while you program is loop over a collection, e.g. an array and do something to it. Debugging practices usually lead you up to the following code snippet:

Foobar foobars[MAX_FOOBARS];
int numFoobars; // initialized somewhere else by the code

for(int i=0; i < numFoobars; i++) {
	// create a shortcut since foobars might be a complex container
	const Foobar& foobar = foobars[i];
	dostuff(foobar);
}

In the past, the compilers had mighty problems to kill the declaration of the loop variable i, copying the loop and performing the exact same thing would make compilers sad. Today, this should be a non-issue, but it might bite you in the posterior if you have an older compiler. Anyways, reusing the variable i in later loops should be no problem if you have a reasonably modern compiler. Often for clarity I break up several operations done on an array as multiple passes over the data with the same loop duplicated in places:

Foobar foobars[MAX_FOOBARS];
int numFoobars ;

for(int i=0; i < numFoobars; i++) {
	const Foobar& foobar = foobars[i];
	precondition(foobar);
}

for(int i=0; i < numFoobars; i++) {
	const Foobar& foobar = foobars[i];
	funstuff(foobar);
}

for(int i=0; i < numFoobars; i++) {
	const Foobar& foobar = foobars[i];
	postfixup(foobar);
}

Here I've reused the variable name i in all three loops, although the meaning of the variable hasn't really changed, they could actually be refering to different places on the stack as the compiler might not bother with reclaiming space, or they might just be kept in registers in optimized builds. By reusing the variable name however it's slightly clearer than if I've used different names for the three loops.

Assignment

Coming from Turbo Pascal learning C++ was interesting, one of the very first really serious bugs I had was related to an incorrect assignment inside an if statement. Pascal has a single = meaning test for equality, and := as assignment. Taking that into account, here was the bug:

FILE* file = fopen(filename, "rt");
if( file = NULL )
	return false;
char buffer[1024] = {0}
fread(buffer, 1, 1024, file);
fclose(file);

So we never ever reached the fread. After double and triple checking that the file really was on disk, I started staring at the code. Sure enough, after a while the quarter fell down and I saw the assignment. Two things came out of this though, the reluctance to assign anything inside a test and also to reverse the order so instead have the rvalue to the left, making it impossible to accidentally assign instead of testing, like this:

FILE* file = fopen(filename, "rt");
if( NULL = file ) // ERROR: Can not assign to 0, it's not an lvalue
	return false;

There are some legitimate cases of assignment inside tests, most common one is that you have a function that returns a pointer and depending upon nullness on that pointer you want to do things. You could have written this like:

if( getInstance() )
	getInstance()->foobar();

Of course that will cause two function calls to a potentially costly function. Better would be to:

Foobar* foo = getInstance();
if( foo )
	foo->foobar();

My argument against this is that for one, it's one extra line of code. Size is really the enemy. The other, much worse effect is that now I have yet another variable name to keep track of and potentially collide with! Ok, selecting good names actually occupy more time than I'm comfortable with. If I literally just call this variable instance, then I'm fine with that. If only the scope was small enough. Well, we can get a small scope, since each if statement openes up a new scope!

if( Foobar* instance = getInstance() )
	instance->foobar();

This (evil singleton) gets the best of two worlds, you can test for nullness and only call the function once. And you don't have to come up with any great name that can last throughout the function. Of course, one might argue that having null pointers at all is a bad practice and always keep default objects that does nothing for these kind of singleton constructs, however that is a whole other article!

Local classes and enums

Sometimes you just want to do quick and dirty things. Sometimes it's even justified. One of these cases could be that when you call specialized functions which require interfaces for callbacks. The typical scenario is that a function requires you to define a new concrete class and instantiate it to act as a callback. Something like this:

struct MyHandler : public Handler {
   virtual void onError(int code) { }
}

bool foobar(const char* command) {
  MyHandler handler;
  foreignApi(&handler);
  return true;
}

The problem is that now the MyHandler class is visible in the file scope, which is a bad thing. Or at least it would force me to choose a longer name, to possibly avoid name clashes during say for example bulk builds (compiling several .cpp files in a hierarchy to speed up link and compile times). There is however no problem to move this class into the function itself! While C++ doesn't support nested function declarations, in a twist of things you can declare classes inside functions:

bool foobar(const char* command) {
  struct MyHandler : public Handler {
    virtual void onError(int code) { }
  } handler;

  foreignApi(&handler);
  return true;
}

Well it might not be so pretty, but at least all the details of the other API is gone, now I can concentrate on the public API instead (consisting of foobar which returns a bool given a string, not too bad). This translates to enums as well and they frequently do clash. I'm subscribing to the philosophy that longer identifiers doesn't necessarily make code more clear, context will. Actually shorter identifiers often makes the code more readable, it also happens to coincide with my inherent lazyness. But the shorter the identifiers, the more important context becomes (i.e. namespaces, and scopes).

The local class thing has one big, nay, huge drawback or limitation. The local class has no linkage. Linkage-schminkage you say, why would I possibly need that? I ask myself the same thing and also the insanity that is C++. However, it means that the most obvious use of this is not legal, indeed doesn't work at all:

void magicsort(int* start, int count) {
  struct MySort
  {
	bool operator()(int a, int b) const { return a < b; }
  };

  std::sort(start, start + count, MySort()); // Does not work!
}

It turns out that template paramter types need to have linkage. Stupid language. So it's back to declaring the little predicates in the parent scope. But it would have been sweet to have them locally in the function. But I guess we have to do that in higher level languages instead.

Switch fallthroughs

Fallthrough is one of those things that looks kind of useless in the beginning. Admitted, it is confusing and you probably won't miss it terribly much. But it's there and it actually does have some uses. One of it's many uses is when you create fairly simple state machines that need to go from one state to another and need to do it directly and not the next time the function is called. Consider:

int sentry(Data* data)
{
	// Local state descriptions. Also serve as some
	// kind of documentation as to what we're doing.
	enum State {
		kUninitlized = 0,
		kWalking,
		kKilling,
		kVictoryDance,
	};

	static State state = kUninitlized;
	switch(state)
	{
	case kUninitlized:
		init(data);

	case kWalking:
		if( !enemyInReach(data) )
			break;
		state = kKilling;

	case kKilling:
		hackAtEnemy(data);
		if( !enemyDead(data) )
			break;
		state = kVictoryDance;
		startDance(data);

	case kVictoryDance:
		if( !danceDone(data) )
			break;
		state = kUninitlized
	}
}

Here we want to have the effect of the edge in the statemachine graph happen right now in this frame and not at some later stage. Note that each break in the switch effectively just skips ahead and returns from the function.

Of course the function in itself is really evil since it has static state, you could only run one of these AI routines at a time in the game the way this is written. Ok, so I cheated and made up a completely artificial example.

In closing

I guess some of these concepts would have made me run screaming for the woods when I first started out programming C++. I guess sometimes you learn what works after time. For the longest of time I considered assignments inside if statements as the devil. However that was probably more colored by my first encounter with them (and the cursing that ensued). Today I use it for things whenever it makes sense. I guess that's what they call growth? What are your horror stories of code that you would have kicked yourself for a couple of years ago?