Welcome to Atalasoft Community Sign in | Help

Libpng – You’re Doing It Wrong

I took a three day hiatus from developing DotImage 9.0 to work on an experiment: how would a 100% managed PNG decoder perform, compared to libpng?

Using only C#, I knocked together a very basic chunk of code that has a very strong .NET feel/smell to it to decode PNG images.  Internally, I build a map of the available chunks in the file and then I ask for them by type.  This is different from most PNG consuming code – the spec encourages you to read chunks in possible expected order, which requires a very state machiney layout.  State machines aren’t bad, per se, but they do lead to code that is not always easy to read or maintain.  So instead of waiting for a chunk of the right type to arrive, I just ask for it by name.

For the required zlib compression, I used DotNetZip, in particular the Zlib assembly.  The main reason for this choice was that Ionics implementation is a filter stream.  I give their stream another stream to work with and it will decompress it while I read from it.  This is awesome, because I don’t need to worry about the underlying implementation.

Well, that’s actually a lie.  I do have to worry about it.  PNG allows compressed image data to be spread across multiple PNG chunks in the file.  So I created a ChunkStream that give a stream a List<ChunkDescriptors> that describe where image data chunks reside, it will read/seek through the chunks as needed.

In sum, my reading model for image data is this Stream->ChunkStream->ZLibStream.

So here is my code for decoding a non-interlaced PNG image:

List<ChunkDescriptor> chunks = _chunkMap[ChunkType.IDAT];
ChunkStream chunkstm = new ChunkStream(_stm, chunks);
ZlibStream stm = new ZlibStream(chunkstm, CompressionMode.Decompress);
for (int y = 0; y < Header.Height; y++)
    int filter = stm.ReadByte();
    if (filter < 0)
        throw new Exception("unexpected EOF");
    if (!FilterIsValid(filter))
        throw new Exception("invalid filter");
    int count = stm.Read(currLine, 0, bytesPerLine);
    Unfilter(currLine, prevLine, (PngAdaptiveFilter)filter, bytesPerLine);
    OnScanlineRead(new ScanlineEventArgs(y, currLine));
    byte[] temp = prevLine;
    prevLine = currLine;
    currLine = temp;

That’s all that it takes – this code is easy to read, easy to maintain, and nicely abstracted.  Do you see the total lack of special case handling?  All that is wrapped up in the stream model and instead, I can concentrate on the decoding.

I did some testing and found that compared with a libpng, this code takes a little less than twice as long.  With some tweaking, I can make it better.  In my case, the bottleneck is not where I expected.  It’s not in I/O.  It’s not in reordering color channels.  It’s not in using an event driven model for reporting scanlines read.  It’s not in mapping out the chunks at the start.  In my particular test case, the bottleneck is the Paeth predictor that was used on the test file to improve compression.

Here is the Paeth predictor as I originally implemented it:

private static int PaethPredictor(int a, int b, int c)
    int p = a + b - c;
    int pa = Math.Abs(p - a);
    int pb = Math.Abs(p - b);
    int pc = Math.Abs(p - c);
    if (pa <= pb && pa <= pc)
        return a;
    else if (pb <= pc)
        return b;
    return c;

This is very simple code, so it’s hard to imagine how you might make it better.  I took this approach:

private static int PaethPredictor(int a, int b, int c)
    int bmc = b - c;
    int pa = bmc < 0 ? -bmc : bmc;
    int amc = a - c;
    int pb = amc < 0 ? -amc : amc;
    int apbmcmc = a + b - c - c;
    int pc = apbmcmc < 0 ? -apbmcmc : apbmcmc;
    if (pa <= Math.Min(pa, pc))
        return a;
    else if (pb <= pc)
        return b;
    return c;

I eliminated the calls to Math.Abs, which helps a little.  I also broke apart the expression p which also helped a little bit.  When I say “a little”, I mean that it shaves off maybe a few hundredths of a second.  To read my source image 100 times took 1500ms and after these basic rearrangements, I got it down to 1100ms.  These tiny changes help because the Paeth Predictor gets called 7.5 million times in my test code.  If I inline it (seriously hurting readability), the result is slightly faster.  At this point the pure managed code runs at 1.5x libpng, which is way better than I expected.

I looked at the image decoding in libpng and while the authors clearly went to a great deal of effort to handle errors and special cases and to make the code as readable as possible, ultimately I think they’re doing it wrong.  My code is memory efficient, easy to read, easy to maintain, and will never leak memory

Published Tuesday, February 23, 2010 12:24 PM by Steve Hawley


No Comments
Anonymous comments are disabled