Steve's Tech Talk : DotPdf–Generate PDF Documents from .NET

DotPdf–Generate PDF Documents from .NET

I’m very proud to announce the release of DotPdf, a new toolkit for .NET from Atalasoft. This toolkit includes a new set of tools for generating your own PDF documents. This is exciting for us as it is a new product that dovetails well with our commitment to PDF as a document format.

In the past, we’ve made available tools for generating image-only PDFs (using our PdfEncoder) as well as generating searchable PDFs (using our PdfTranslator, part of our OCR toolkit). Each of these tools is based on the same suite of low-level tools that I’ve been building out for Atalasoft for the past 6 years. I’m comfortable using these tools, but they really are not made for general consumption. They bear a certain similarity to existing tools on the market, in that to use them effectively you have to have at least a cursory, if not an intimate understanding of the PDF specification. The result of not fully understanding the implications of what you are doing in that realm, means a greater chance of generating bad PDFs.

Since I worked at Adobe on Acrobat versions 1.0-4.0, I have no problem understanding the specification and the deeper intent (as I was either responsible for those sections or worked with people who were). Not everyone can say the same thing. So one of my goals for the PDF generation toolkit was simple: NO BAD PDFs. In designing and working with this kit, with Kevin Hulse, we tried to set the balance point correctly for flexibility, abstraction, usability, and extensibility, yet hide the actual details of PDF generation so that you needn’t ever worry about what keys are optional or required in dictionaries within the PDF and so on.

Trying to picture things from our customer’s point of view, I wanted the abstraction to feel more like working with a WinForms Graphics object. Ideally, I wanted to subclass the Graphics object entirely, but it’s a sealed class (why? shame on MS). So there is a layer of code that feels like the Graphics object, yet this is not the layer you are likely to use. Instead, we also created a shape abstraction with a number of canned shapes for you to work with as well as the ability to create new shapes or compose existing shapes. For example, if you wanted to have a star shape with text inside it, composing the two separate into a new shape is very, very easy.

We provide a number of text shapes that can be used for a number of different tasks. Don’t like them? Make your own. Kevin was given the task of taking our existing Barcode Writing API and adapting it for PDF. It took him a scant few days, and the output is not simple images, but actual drawing primitives. If you have our barcode writing toolkit, then you get a barcode shape for free!

One surprise in this kit is something that you won’t see in other PDF toolkits: round tripping. We have put in the ability to embed the shape information into the PDF itself in a way that we can identify it and pull it out and reconstruct them. This means that you as a customer can create incomplete document templates (letterhead, chapter page layouts, billing statements, and so on) and store them in a respository. Then your business logic can retrieve them later and fill them out on a per-customer basis. If the pages weren’t generated by us to start with, that’s OK – the pages are still there, although opaque to our tools today.

We also tried to hit scalability. I wanted to be able to generate documents that were potentially thousands of pages long. One of my early tests included generating a 1,000 page PDF document with an image on each page in the most naïve way possible – the result: 10 seconds. If the pages are all dense text, 1,000 pages takes 3 minutes. The naïve approach is to make all the pages up front then save them. Taking the raw text and putting it into word and letting Acrobat X generate the PDF takes close to 6 minutes. Longer if you count the two minutes it took to just copy the text to clipboard. Remember that Acrobat is written in C++ and this toolkit is in C#.

I also made a goal of being able to write code to render Moby *** and the code should be concise and straightforward. I was able to do just that and in just about 100 lines of code. The code also includes chapter numbering, chapter headers, automatic page numbering (center justified, but this doesn’t have to be the case, alternating L/R justification would be trivial).

In the upcoming weeks, I will be posting blog articles about DotPdf and cookbook examples of working with it to your best benefit.

Download it and give it a try. I’m very proud of this kit. And remember: NO BAD PDFs.

Published Tuesday, December 06, 2011 3:20 PM by Steve Hawley

Comments

No Comments

Anonymous comments are disabled

Steve's Tech Talk

This Blog

Syndication

Search

Navigation

Tags

Recent Posts

Archives

DotPdf–Generate PDF Documents from .NET

Comments