Recent blog posts
This is our new blog page. If you're looking for posts before 2012, see our archive.
This blog will cover some of the complexities that lurk behind programmatically creating, editing, or filling in a text form field in a PDF document using DotPdf.
There are a number of challenges in presenting an API for a “simple” PDF text field (in our API, it's a TextWidgetAnnotation). There has to be a balance between hiding complexity and presenting flexibility. This is even more challenging because the PDF specification for how the contents of a text field should appear has gone through three completelu different and largely incompatible revisions over the current life of the PDF specification.
In DotPdf, I try to shield you from the complexities of the PDF specification. In this case, I’m left somewhat helpless because of the changes in PDF.
The possibilities for how a text field may appear include:
Nothing supplied. Your viewer makes the decision on how the field contents appear.
A default appearance. In PDF this is an embedded stri...
In Part One of this series, I designed an API for a Document Tree Viewer now called WebDocumentRepository. Since then, I have completed its coding and created a working sample of it using our WebDocumentViewer and WebDocumentThumbnailer. The new control is easy to setup and configure:
_repo = new WebDocumentRepository(_viewer, _thumbs, $("#Tree"));
The constructor takes the WebDocumentViewer control, the WebDocumentThumbnailer, and the jQuery object where the tree view is to be placed. Once that has been called, the AddDocument method of the WebDocumentRepository is used to add each document:
_repo.AddDocument(doc.Path, doc.Name, doc.Url, doc.ToolTip);
Path – The folder structure to display in the Repository Viewer (e.g. “Documents/Educational”)
Name – The display name of the Document in the Repository Viewer
URL – The URL that the WebDocumentViewer will use to open the file
ToolTip – The display...
While working on our recent port of DotPdf to Java, I added a number of tools to our overall toolset to enable more functional programming in the class IterableHelper, which is available in the ProductAPI jar.
Of course, one of the tools I put in was fold, which has the following signature:
public static <State, T> State fold(Folder<state, T> folder, State initialState, Iterable<T> sequence)
This particular signature lets me do all kinds of interesting folds which I need in my code base. Technically, I don’t need them, but there are circumstances where a fold or a function based on fold is much easier to read and maintain. Recently, I ported a chunk of code that summed the width of all columns in an iterable data structure. In F#, I might write this function to do the job:
let totalWidth cols = cols |> List.fold (fun sum elem -> sum + elem.Width) 0.0
I could write this in Java, but this is what it will look like (without lambda expressio...
Two weeks ago, we surveyed our customers to find out what features they wanted to see in future versions of our SDKs. There were many inspiring feature suggestions that ended up logged in TFS. One feature, suggested more than once, stood out as a perfect feature to implement externally, then write about the process and provide sample code that developers could use directly next to our SDK.
The goal of this new feature is to create WebDocumentRepositoryViewer with a tree view of the structure of documents within the repository. It’s essentially an HTML5 treeview of available documents that links to our web viewers to display the selected document. When a document is selected in the repository, the WebDocumentViewer and WebDocumentThumbnailer open the new file allowing you to scroll through the pages as usual – all with no plugins necessary.
It is my plan to create this control using a jQuery control called FancyTree, and provide the following API ...
General Atalasoft SDK Changes
Drag and Drop Thumbnails in the Web Document Viewer Control
Customers have asked and now we've delivered!
You now have the ability to rearrange pages in a document through the WebDocumentViewer by dragging thumbnails in the WebDocumentThumbnailer.
Changes will automatically be reflected in the connected WebDocumentViewer. Calling save() after rearranging the thumbnails will cause the users’ changes to the document to automatically save in a new file on the server side.
Enhanced IE11 Support
With the release of 10.4.1, we added support for IE11 and with any new browser there are complexities that are not always seen up front.
DotImage PDF Bundle Changes
PDF Forms in the Web Document Viewer
Harnessing the power o...
Last month I attended Microsoft’s developer show TechEd North America 2014 in Houston, Texas, and with all this time passed, the conference is a blur. However, I do remember each of the firsts I accomplished:
üFirst business trip.
üFirst visit to Texas.
üCreated first interactive demonstration application (much harder than it sounds).
üRode my first mechanical bull.
üWatched the Houston Astros win a game (I hear that’s an actual first).
At the Atalasoft booth we gave away more than 300 shirts and received over a thousand entries for our 3D printer giveaway. Additionally, we showed our interactive demo numerous times and it did not irreparably break! (Achievement Unlocked).
I’d like to highlight a few things from the event and give some shout-outs:
Josh D’Ambrosio won the 3D printer. Congratulations!
Creating a product that is an API presents many challenges as an architect. There are a number of axes that describe trade-offs that are omnipresent when adding support for a particular feature. For example you might have an easy-to-understand public abstraction at the cost of a challenging (or unreliable) private implementation. I’m going to take you through the process I went through in order to implement a feature in DotPdf for a customer.
The back story is that the PDF specification includes a misfeature called “PDF Portfolios”. In the PDF specification, these are called “Portable Collections” (a portfolio in the real world is a collection of documents that you carry). This feature is a way in which a number of documents/files can be embedded within a single PDF file and accessed from within the viewer’s UI. The embedded documents need not be PDF, but could be a Word doc, email, text, images, etc. The resulting embedded files can be prese...
A Domain-Specific-Language (DSL) is a small language used to make routine tasks in a particular problem easier. Examples of DSLs include spread-sheet macros, the Unix software build utility known as Make, and the virtual machine I wrote to parse PDF implements a simple DSL.
When you consider the syntax of most modern-ish programming languages (I’m looking at you C++, Java, C#, F#), nearly all of them are a hodge-podge of DSLs jammed together. This is sometimes a horrible thing, and unfortunately it’s our own fault. It stems from how we got here in the first place and how we saw our problem domain.
The first thing that comes to mind is assignment, which is the first DSL. Value mutation is a direct reflection of the initial implementation of hardware. We had memory that was used to hold numbers and we needed a way to put/get values into/from cells. Thus was born the “move” instruction (or load/store instructions in accumulator ...
With the new Tesseract 3.2 engine available as an add-on for Atalasoft DotImage, I have been more interested in the quality of OCR results. When I scour the internet for OCRed documents, I find that many of them have words that are misspelled due to a misinterpreted character or omitted letter. I thought about spellcheck being able to solve this issue, and after experimentation I believe it can only make minor improvements to the overall OCR results without very sophisticated integration.
With DotImage the OcrEngine object is setup to be very extensible; giving hooks into many major steps of the OCR process. Using DotImage I came up with two simple algorithms to use an open source .NET spell checking engine, “Missing Letter” and “Single Incorrect Letter:”
In several of the raw OCR results from my sample set I noticed that there would be words that were completely missing a letter. The spell check engine provided good guesses when a let...
Hi there! I’m Kevin Hulse, the newish Solutions Enablement Specialist at Atalasoft. You may have worked with me directly after I started working at Atalasoft as a Developer Support Engineer nearly six years ago. Since starting here, I have worked in almost every department from Support to Engineering and now Marketing (watch out Sales). I hope to begin a small series of blogs on all things OCR and plan on providing interesting, technical-minded posts on our products, our customers, and document imaging in general, as well as posts on things that I simply find interesting enough to talk about.
Speaking of products, with the release of DotImage 10.4.1, our OCR libraries have been upgraded to handle version 3.02 of the Tesseract OCR Engine. This upgrade includes a few small improvements to speed and accuracy of processing as well as an increased ability to use new data packages to support more extended character sets. Additionally, here’s a list of all the langua...