We should forget about small efficiencies, say about 97% of the time.
-- Donald Knuth
Introduction
I'm going to do something different today, instead of bashing on C++ as I usually do, I'm going to point at some good things about it. In fact, it's even involving templates, my old archenemy (Bowser is nothing compared to them). So while I'm happily downloading The Orange Box on Steam, I'm writing this very quick and dirty post.
Sorting
There are numerous texts on sorting and the different algorithms, their complexities and their merits. This article is not going to go into any of that. You might even go, hey there is a library function for this, why reinvent the wheel? Turns out that there are two different functions, the old trusty qsort from the standard C library and the new std::sort from the C++ library (STL). You might think that they both perform about the same? Not really.
If we look at qsort, it takes a function callback to do the actual compare between two items since it the library function itself doesn't know anything about the data. You send in the stride and the range to the function. On the plus side, the function does not know about the data at compile time and can be put in a library. The function callback is pretty bad though, it will cause an indirection for each compare. The function call will in worst case cause a stack reference to store the return address. The compiler has no chance to actually rearrange the calling convention, since the function signature actually defines that the callback needs to be a standard C function.
Some actual tests
Ok, that's enough jabbering from me. Let's run some tests. We're going to do some fairly quick and dirty tests, not particularly good actually. At most they can show some indication on what can happen, but as Christer Ericson mentions, simple tests like these are not always so good, or just downright bad. But for the special case when you do have a large-ish array of linear (small) things that you need to sort with a simple short comparison function (ok, I've probably lost most of you here?) this could be somewhat true.
static int oldStyleCompare( const void* a_, const void* b_ ) { const float a = *(float*)a_; const float b = *(float*)b_; if( a < b ) return -1; if( a > b ) return 1; return 0; } void doOldSort( float* begin, float* end ) { std::qsort((void*)begin, end-begin, sizeof(begin[0]), oldStyleCompare); }
That's the old sort that we all know and love.
namespace { struct NewCompare { bool operator()( const float a, const float b ) const { return a < b; } }; } void doNewSort( float* begin, float* end ) { std::sort(begin, end, NewCompare()); }
Putting this together into a small program that just runs this over a random array a couple of times results in this output:
688.89 ms, std::qsort 356.20 ms, std::sort, inline functor 668.34 ms, std::sort function call
From the output we can see that using std::sort and giving the function a chance to inline the whole thing is faster than the regular qsort. Even though std::sort actually expands out to a lot more code, it's faster. Turning around and preventing std::sort from inlining the comparison shows that the times goes up and is now comparable to the regular qsort. Now I would take these results with a grain of salt, the arrays were different, the algorithms might be different and I'm running on my non deterministic PC (and it runs Vista, thus all bets are off). But something is going on here definitely and the function call to do the compare for qsort and the fact that it can be inlined by std::sort looks like a win.
The other side
Ok, so why are not everybody doing this? Hm, well, there are some drawbacks to std::sort as well, which are nicely swept under the rug. One, it's freaking templates. It also includes <algorithm>, which usually is on par with windows.h. Hate hate. The C version uses such amazing things as prototypes and prebuilt libraries (fancy such advanced features), which if you are careful can induce zero inclusion cost... (yepp, just declare the prototype locally, it's not as it will actually change).
Compile times are important, it's something that is usually not covered until it's way too late. The term "death by a thousand cuts" comes to mind. Don't underestimate the gain from having rapid turnaround times for the programmers. Going with a solution that causes you to use the possibly slower qsort, but having faster compile times (maybe 1 minute faster) does add up over the course of a project. Then the programmers are less frustrated and more eager to tackle the runtime performance problems and maybe comes up with a better algorithm that doesn't require as much sorting, thus speeding up the game in the end more than the guys who went with the std::sort method. No, really. It could happen.
In closing
Of course Scott Meyers already discussed this in length in his book Effective STL. This is just another article reinforcing that std::sort is in fact quite superior to the old workhorse qsort. Despite what I've said there, I quite like the algorithms from STL. Since they're completely orthogonal from the containers (which have all sorts of problems, see EA STL for a discussion) they work quite nicely on just regular old pointers. You can thus do binary searches, emulate std::map etc with just a regular old pointer allocated through malloc... sweet. My favourite is std::equal_range (which is also discussed in depth in Meyer's book). Now, don't go over and "fix" all of your sorting. But it's certainly interesting and if you are already paying the price of including <algorithm> then you can consider using std::sort instead of qsort.