Welcome to Atalasoft Community Sign in | Help

When "new" is Too Slow

I was working on some prototype code that needed to be super fast.  After getting basic functionality up and running, I found that the biggest chunk of time was getting lost in operator new.  In this particular chunk of code, I had some simulated recursion that was making lots and lots of little objects that were being pushed and popped onto a stack.  The issue was that these objects are tiny and transient.  This means that all the overhead imposed by new and delete will get charged again and again and again.

There are a number of choices available to you when you do this.  I'll pass on one that breaks a number of conventional OO design rules, specifically is-a/has-a.  In my case, I had some objects that have two purposes in life - to hold data and to live in a stack.  If you are a fan of STL, you will probably haul out stack<t> and call it good.  In my case, STL's performance was heinous, so I chose to smudge the is-a/has-a line and make my little object into a stack element rather than using a stack element to hold it:

class StackPoint {
    StackPoint(unsigned int x, unsigned int y) : m_next(NULL)
        m_pt.x = x; m_pt.y = y;
    StackPoint (POINT pt) : m_pt(pt), m_next(NULL) { }
    virtual ~StackPoint () { }
    int X() { return m_pt.x; }
    int Y() { return m_pt.y; }
    void SetX(int x) { m_pt.x = x; }
    void SetY(int y) { m_pt.y = y; }
    StackPoint *GetNext() { return m_next; }
    void SetNext(StackPoint *next) { m_next = next; }
    POINT m_pt;
    StackPoint *m_next;

class MyStack {
    MyStack() : m_stack(NULL) { }
    virtual ~MyStack();
    bool IsEmpty() { return m_stack == NULL; }
    void Push(StackPoint *item) { item->SetNext(m_stack); m_stack = item; }
    StackPoint *Pop();
    StackPoint *m_stack;


In these snippets, I'm leaving out some of the methods, but they're simple to write.  Pop() returns null if the stack is empty, or the top of the stack otherwise, resetting the stack pointer.  The destructor rips the entire stack, disposing each element in turn.  From these two classes, I can now create an allocator like this:

class StackPointAllocator {
    StackPointAllocator() {  }
    virtual ~StackPointAllocator() { }

    StackPoint *Alloc(int x, int y);
    void Free(StackPoint *p);
    MyStack m_stack;

 In this case, the StackPointAllocator has a method called Alloc which will do one of two things.  If m_stack.IsEmpty() is true, it will call operator new.  If m_stack.IsEmpty() is false, it will pop the top item off, set x and y and return it.  In the typical case, there is an item left on the stack and so "allocation" (really recycling), is only a few assignment statements.  Note that m_stack is a NOT a pointer, which means that its destructor will automatically get called when the allocator goes out of scope or is deleted.

To give you a sense of the power of doing this, the original problem with allocation was costly because I was allocating tens of millions of these objects, each of which has a very short life span.Even though the individual cost was small, multiplying by 10M added up.  I'd rather do 40M assignments.  Once the code was refactored (easy), the time for allocation totally vanished from the performance profile.

Now, the real C++ programmer question is "Why didn't I override global operator new for StackPoint?"

Overriding global operator new means that all code that allocates these objects will go through the global allocator - that's a bigger deal.  I use this code in other places that shouldn't use this allocator.  I end up making the allocators I use stack objects, which means that objects in the free list disappear when they go out of scope.

Classic engineering dilemma: which is more important? performance or OO design?  In this case, the API is NOT public, so the dirty little secret is hidden away, making this a perfectly acceptable choice.  Performance, FTW.  In addition, there were two other considerations - readability and expedience.  Readability was not lost at all, and since I could knock this out inside a half hour, I won on expedience too. 

Published Monday, May 05, 2008 9:46 AM by Steve Hawley


Wednesday, May 07, 2008 1:00 AM by DotNetKicks.com

# Performance vs. Object Oriented Design

You've been kicked (a good thing) - Trackback from DotNetKicks.com

Thursday, May 08, 2008 4:20 PM by Steve Hawley

# re: When "new" is Too Slow

I was looking at some of the commentary on Reddit and once you filter out the invective, there are some interesting points present, but they don't apply in my circumstances.

One thing that's missing from my article is that this is part of an image processing algorithm in which speed counts a great deal.  It's not that I'm doing anything particularly hard, but when you have to do anything on the order of hundreds of millions of times, the little stuff starts to matter a lot.  A scant few instructions can balloon into more time than you might think.

The main issue was that in performance profiling, the number one consumer of time was operator new and operator delete, being called on the order of 1.3 million times each.

The obvious solution is to change the algorithm (which I had already done, which is why operator new was only being called 1.3 million times instead of 2.1 million times).

Then the question is why not use vector<POINT> - not so simple - the actual data structure represented is not a POINT - it's more complicated.  This means that every push and pop invokes the copy constructor,  which will be 2.6 million copy constructors.  Tried it - not acceptable.

In my solution, the cost of push (it inlines) is two assignments - roughly 3 instructions in the final optimized code.  Pop is similarly cheap (it's not inline as it is debug assert heavy).

Placement new?  Wonderful tool if you know that you will never exceed a maximum size.  I can't make that guarantee without wasting a lot of memory.

Exception safety?  The code never throws.  Nothing that it uses throws.  It will never throw by design - therefore, no exception overhead.  If things have to clean up from an exception in the middle of this code I've done something horribly wrong.

Override operator new? As I said, the object is needed in other places - I can't override it - the constraints don't allow it.

Thanks for your comments, I appreciate the time you've taken to think it through.  You might also consider registering here and commenting to the article directly so everyone can benefit from your thoughts.

Anonymous comments are disabled