Recent blog posts
This is our new blog page. If you're looking for posts before 2012, see our archive.
Two weeks ago, we surveyed our customers to find out what features they wanted to see in future versions of our SDKs. There were many inspiring feature suggestions that ended up logged in TFS. One feature, suggested more than once, stood out as a perfect feature to implement externally, then write about the process and provide sample code that developers could use directly next to our SDK.
The goal of this new feature is to create WebDocumentRepositoryViewer with a tree view of the structure of documents within the repository. It’s essentially an HTML5 treeview of available documents that links to our web viewers to display the selected document. When a document is selected in the repository, the WebDocumentViewer and WebDocumentThumbnailer open the new file allowing you to scroll through the pages as usual – all with no plugins necessary.
It is my plan to create this control using a jQuery control called FancyTree, and provide the following API ...
General Atalasoft SDK Changes
Drag and Drop Thumbnails in the Web Document Viewer Control
Customers have asked and now we've delivered!
You now have the ability to rearrange pages in a document through the WebDocumentViewer by dragging thumbnails in the WebDocumentThumbnailer.
Changes will automatically be reflected in the connected WebDocumentViewer. Calling save() after rearranging the thumbnails will cause the users’ changes to the document to automatically save in a new file on the server side.
Enhanced IE11 Support
With the release of 10.4.1, we added support for IE11 and with any new browser there are complexities that are not always seen up front.
DotImage PDF Bundle Changes
PDF Forms in the Web Document Viewer
Harnessing the power o...
Last month I attended Microsoft’s developer show TechEd North America 2014 in Houston, Texas, and with all this time passed, the conference is a blur. However, I do remember each of the firsts I accomplished:
üFirst business trip.
üFirst visit to Texas.
üCreated first interactive demonstration application (much harder than it sounds).
üRode my first mechanical bull.
üWatched the Houston Astros win a game (I hear that’s an actual first).
At the Atalasoft booth we gave away more than 300 shirts and received over a thousand entries for our 3D printer giveaway. Additionally, we showed our interactive demo numerous times and it did not irreparably break! (Achievement Unlocked).
I’d like to highlight a few things from the event and give some shout-outs:
Josh D’Ambrosio won the 3D printer. Congratulations!
Creating a product that is an API presents many challenges as an architect. There are a number of axes that describe trade-offs that are omnipresent when adding support for a particular feature. For example you might have an easy-to-understand public abstraction at the cost of a challenging (or unreliable) private implementation. I’m going to take you through the process I went through in order to implement a feature in DotPdf for a customer.
The back story is that the PDF specification includes a misfeature called “PDF Portfolios”. In the PDF specification, these are called “Portable Collections” (a portfolio in the real world is a collection of documents that you carry). This feature is a way in which a number of documents/files can be embedded within a single PDF file and accessed from within the viewer’s UI. The embedded documents need not be PDF, but could be a Word doc, email, text, images, etc. The resulting embedded files can be prese...
A Domain-Specific-Language (DSL) is a small language used to make routine tasks in a particular problem easier. Examples of DSLs include spread-sheet macros, the Unix software build utility known as Make, and the virtual machine I wrote to parse PDF implements a simple DSL.
When you consider the syntax of most modern-ish programming languages (I’m looking at you C++, Java, C#, F#), nearly all of them are a hodge-podge of DSLs jammed together. This is sometimes a horrible thing, and unfortunately it’s our own fault. It stems from how we got here in the first place and how we saw our problem domain.
The first thing that comes to mind is assignment, which is the first DSL. Value mutation is a direct reflection of the initial implementation of hardware. We had memory that was used to hold numbers and we needed a way to put/get values into/from cells. Thus was born the “move” instruction (or load/store instructions in accumulator ...
With the new Tesseract 3.2 engine available as an add-on for Atalasoft DotImage, I have been more interested in the quality of OCR results. When I scour the internet for OCRed documents, I find that many of them have words that are misspelled due to a misinterpreted character or omitted letter. I thought about spellcheck being able to solve this issue, and after experimentation I believe it can only make minor improvements to the overall OCR results without very sophisticated integration.
With DotImage the OcrEngine object is setup to be very extensible; giving hooks into many major steps of the OCR process. Using DotImage I came up with two simple algorithms to use an open source .NET spell checking engine, “Missing Letter” and “Single Incorrect Letter:”
In several of the raw OCR results from my sample set I noticed that there would be words that were completely missing a letter. The spell check engine provided good guesses when a let...
Hi there! I’m Kevin Hulse, the newish Solutions Enablement Specialist at Atalasoft. You may have worked with me directly after I started working at Atalasoft as a Developer Support Engineer nearly six years ago. Since starting here, I have worked in almost every department from Support to Engineering and now Marketing (watch out Sales). I hope to begin a small series of blogs on all things OCR and plan on providing interesting, technical-minded posts on our products, our customers, and document imaging in general, as well as posts on things that I simply find interesting enough to talk about.
Speaking of products, with the release of DotImage 10.4.1, our OCR libraries have been upgraded to handle version 3.02 of the Tesseract OCR Engine. This upgrade includes a few small improvements to speed and accuracy of processing as well as an increased ability to use new data packages to support more extended character sets. Additionally, here’s a list of all the langua...
In addition to writing code from the ground up, we also work with other library developers and we package the libraries in a C# or Java API which is typically easier to work with or more convenient for our customers. Many of our customers aren’t comfortable working with C++ libraries or sometimes the C++ libraries have awkward interfacing and that’s fine. We’re very good at taking this type of API and presenting it in a way that feels right for .NET or the JVM and integrates with the rest of our code base.
Still, the same way that you work with us, we have to, at times, work with other library writers and we find bugs every now and again.
Here are five tips for working with library creators to get the most out of your interactions.
Make sure that you are using the library correctly. The reason is that (hopefully) your library has a model of operation that lends itself to a particular model of usage. For example, some librarie...
I ran into a failing C# unit test today, with the following output:
But was: True
Seriously. I stopped it in the debugger and the property that was being checked was “true”. I set it to a local and that was also “true” in the debugger.
So when is it the case that true != true?
The answer, to me, was straight forward: in C#, “true” is supposed to be the 32-bit value 0x00000001, but in many languages, “true” is defined as “anything that is not ‘false’ (aka, 0).
Since the code that was generating the value was C++/CLI interfacing with C, it seemed pretty clear where the issue was – I opened up a Memory window in the debugger and dropped the member onto it, which showed that the value for the boolean was 0x00002080 (or something like that). The culprit was C++ code that was calling a low-level C function, passing in two locals by reference. ...
I spent last week in Antwerp, Belgium attending the Devoxx conference. It was an interesting experience, not the least of which was because I was there playing a trio of roles: developer, presenter, and evangelist.
In the developer role, I was looking for new technologies that we could make use of in current or future work. Interesting things I saw included Genymotion (a high speed android emulator) and the Dart programming language.
My presentation on parsing PDF in Java went well, although I went way too fast – it happens when you are both a little nervous and very passionate about your topic. Hopefully the talk will be up on parleys.com soon – at this writing, the channel page is empty.
What struck me the most was working the trade show floor. It’s always a bit of a challenge at developer conferences in that we, as developers, can be introverted and do not want to interact with other people. Very often, people who walked by...