When "new" is Too Slow
I was working on some prototype code that needed to be super fast. After getting basic functionality up and running, I found that the biggest chunk of time was getting lost in operator new. In this particular chunk of code, I had some simulated recursion that was making lots and lots of little objects that were being pushed and popped onto a stack. The issue was that these objects are tiny and transient. This means that all the overhead imposed by new and delete will get charged again and again and again.
There are a number of choices available to you when you do this. I'll pass on one that breaks a number of conventional OO design rules, specifically is-a/has-a. In my case, I had some objects that have two purposes in life - to hold data and to live in a stack. If you are a fan of STL, you will probably haul out stack<t> and call it good. In my case, STL's performance was heinous, so I chose to smudge the is-a/has-a line and make my little object into a stack element rather than using a stack element to hold it:
class StackPoint {
public:
StackPoint(unsigned int x, unsigned int y) : m_next(NULL)
{
m_pt.x = x; m_pt.y = y;
}
StackPoint (POINT pt) : m_pt(pt), m_next(NULL) { }
virtual ~StackPoint () { }
int X() { return m_pt.x; }
int Y() { return m_pt.y; }
void SetX(int x) { m_pt.x = x; }
void SetY(int y) { m_pt.y = y; }
StackPoint *GetNext() { return m_next; }
void SetNext(StackPoint *next) { m_next = next; }
private:
POINT m_pt;
StackPoint *m_next;
};
class MyStack {
public:
MyStack() : m_stack(NULL) { }
virtual ~MyStack();
bool IsEmpty() { return m_stack == NULL; }
void Push(StackPoint *item) { item->SetNext(m_stack); m_stack = item; }
StackPoint *Pop();
private:
StackPoint *m_stack;
};
In these snippets, I'm leaving out some of the methods, but they're simple to write. Pop() returns null if the stack is empty, or the top of the stack otherwise, resetting the stack pointer. The destructor rips the entire stack, disposing each element in turn. From these two classes, I can now create an allocator like this:
class StackPointAllocator {
public:
StackPointAllocator() { }
virtual ~StackPointAllocator() { }
StackPoint *Alloc(int x, int y);
void Free(StackPoint *p);
private:
MyStack m_stack;
};
In this case, the StackPointAllocator has a method called Alloc which will do one of two things. If m_stack.IsEmpty() is true, it will call operator new. If m_stack.IsEmpty() is false, it will pop the top item off, set x and y and return it. In the typical case, there is an item left on the stack and so "allocation" (really recycling), is only a few assignment statements. Note that m_stack is a NOT a pointer, which means that its destructor will automatically get called when the allocator goes out of scope or is deleted.
To give you a sense of the power of doing this, the original problem with allocation was costly because I was allocating tens of millions of these objects, each of which has a very short life span.Even though the individual cost was small, multiplying by 10M added up. I'd rather do 40M assignments. Once the code was refactored (easy), the time for allocation totally vanished from the performance profile.
Now, the real C++ programmer question is "Why didn't I override global operator new for StackPoint?"
Overriding global operator new means that all code that allocates these objects will go through the global allocator - that's a bigger deal. I use this code in other places that shouldn't use this allocator. I end up making the allocators I use stack objects, which means that objects in the free list disappear when they go out of scope.
Classic engineering dilemma: which is more important? performance or OO design? In this case, the API is NOT public, so the dirty little secret is hidden away, making this a perfectly acceptable choice. Performance, FTW. In addition, there were two other considerations - readability and expedience. Readability was not lost at all, and since I could knock this out inside a half hour, I won on expedience too.