Login
 

Atalasoft Imaging SDK Development Blog

Document Imaging and Developer Commentary

Blog Home RSS Feed Old Archive Atalasoft.com

Recent blog posts

This is our new blog page. If you're looking for posts before 2012, see our archive.

Hammers Vs. Nails


While working on our recent port of DotPdf to Java, I added a number of tools to our overall toolset to enable more functional programming in the class IterableHelper, which is available in the ProductAPI jar. Of course, one of the tools I put in was fold, which has the following signature: public static <State, T> State fold(Folder<state, T> folder, State initialState, Iterable<T> sequence) This particular signature lets me do all kinds of interesting folds which I need in my code base. Technically, I don’t need them, but there are circumstances where a fold or a function based on fold is much easier to read and maintain. Recently, I ported a chunk of code that summed the width of all columns in an iterable data structure. In F#, I might write this function to do the job: let totalWidth cols = cols |> List.fold (fun sum elem -> sum + elem.Width) 0.0 I could write this in Java, but this is what it will look like (without lambda expressio...

Read More

Posted by Steve Hawley on 08/25/2014 with 0 comments

Document Tree Viewer with Atalasoft, Part One


Two weeks ago, we surveyed our customers to find out what features they wanted to see in future versions of our SDKs.  There were many inspiring feature suggestions that ended up logged in TFS. One feature, suggested more than once, stood out as a perfect feature to implement externally, then write about the process and provide sample code that developers could use directly next to our SDK. The goal of this new feature is to create WebDocumentRepositoryViewer with a tree view of the structure of documents within the repository.  It’s essentially an HTML5 treeview of available documents that links to our web viewers to display the selected document. When a document is selected in the repository, the WebDocumentViewer and WebDocumentThumbnailer open the new file allowing you to scroll through the pages as usual – all with no plugins necessary. It is my plan to create this control using a jQuery control called FancyTree, and provide the following API ...

Read More

Posted by Kevin Hulse on 08/21/2014 with 0 comments

Atalasoft 10.5 SDKs Released


General Atalasoft SDK Changes Drag and Drop Thumbnails in the Web Document Viewer Control Customers have asked and now we've delivered! You now have the ability to rearrange pages in a document through the WebDocumentViewer by dragging thumbnails in the WebDocumentThumbnailer. Changes will automatically be reflected in the connected WebDocumentViewer. Calling save() after rearranging the thumbnails will cause the users’ changes to the document to automatically save in a new file on the server side. Enhanced IE11 Support With the release of 10.4.1, we added support for IE11 and with any new browser there are complexities that are not always seen up front. With 10.5, we have found and removed corner cases with small thumbnails, right-to-left languages, and JavaScript alert errors when there wasn’t a logical error. DotImage PDF Bundle Changes PDF Forms in the Web Document Viewer Harnessing the power o...

Read More

Posted by Rick Casucci on 07/30/2014 with 0 comments

TechEd 2014


TechEd 2014   Last month I attended Microsoft’s developer show TechEd North America 2014 in Houston, Texas, and  with all this time passed, the conference is a blur. However, I do remember each of the firsts I accomplished: üFirst business trip. üFirst visit to Texas.  üCreated first interactive demonstration application (much harder than it sounds). üRode my first mechanical bull. üWatched the Houston Astros win a game (I hear that’s an actual first).  At the Atalasoft booth we gave away more than 300 shirts and received over a thousand entries for our 3D printer giveaway. Additionally, we showed our interactive demo numerous times and it did not irreparably break! (Achievement Unlocked). I’d like to highlight a few things from the event and give some shout-outs: Josh D’Ambrosio won the 3D printer. Congratulations! ...

Read More

Posted by Kevin Hulse on 06/30/2014 with 0 comments Tags: TechEd2014

Anatomy of a Feature Request


Creating a product that is an API presents many challenges as an architect. There are a number of axes that describe trade-offs that are omnipresent when adding support for a particular feature. For example you might have an easy-to-understand public abstraction at the cost of a challenging (or unreliable) private implementation. I’m going to take you through the process I went through in order to implement a feature in DotPdf for a customer. The back story is that the PDF specification includes a misfeature called “PDF Portfolios”. In the PDF specification, these are called “Portable Collections” (a portfolio in the real world is a collection of documents that you carry). This feature is a way in which a number of documents/files can be embedded within a single PDF file and accessed from within the viewer’s UI. The embedded documents need not be PDF, but could be a Word doc, email, text, images, etc. The resulting embedded files can be prese...

Read More

Posted by Steve Hawley on 06/25/2014 with 0 comments

Your Whole Programming Language is a Set of Domain-Specific-Languages


  A Domain-Specific-Language (DSL) is a small language used to make routine tasks in a particular problem easier. Examples of DSLs include spread-sheet macros, the Unix software build utility known as Make, and the virtual machine I wrote to parse PDF implements a simple DSL. When you consider the syntax of most modern-ish programming languages (I’m looking at you C++, Java, C#, F#), nearly all of them are a hodge-podge of DSLs jammed together. This is sometimes a horrible thing, and unfortunately it’s our own fault. It stems from how we got here in the first place and how we saw our problem domain. The first thing that comes to mind is assignment, which is the first DSL.  Value mutation is a direct reflection of the initial implementation of hardware.  We had memory that was used to hold numbers and we needed a way to put/get values into/from cells.  Thus was born the “move” instruction (or load/store instructions in accumulator ...

Read More

Posted by Steve Hawley on 05/05/2014 with 0 comments

Improving OCR Results: Adding Spellcheck


With the new Tesseract 3.2 engine available as an add-on for Atalasoft DotImage, I have been more interested in the quality of OCR results. When I scour the internet for OCRed documents, I find that many of them have words that are misspelled due to a misinterpreted character or omitted letter. I thought about spellcheck being able to solve this issue, and after experimentation I believe it can only make minor improvements to the overall OCR results without very sophisticated integration. With DotImage the OcrEngine object is setup to be very extensible; giving hooks into many major steps of the OCR process. Using DotImage I came up with two simple algorithms to use an open source .NET spell checking engine, “Missing Letter” and “Single Incorrect Letter:” Missing Letter In several of the raw OCR results from my sample set I noticed that there would be words that were completely missing a letter. The spell check engine provided good guesses when a let...

Read More

Posted by Kevin Hulse on 03/18/2014 with 0 comments

Some Introduction, Some Tesseract


Hi there! I’m Kevin Hulse, the newish Solutions Enablement Specialist at Atalasoft. You may have worked with me directly after I started working at Atalasoft as a Developer Support Engineer nearly six years ago. Since starting here, I have worked in almost every department from Support to Engineering and now Marketing (watch out Sales). I hope to begin a small series of blogs on all things OCR and plan on providing interesting, technical-minded posts on our products, our customers, and document imaging in general, as well as posts on things that I simply find interesting enough to talk about. Speaking of products, with the release of DotImage 10.4.1, our OCR libraries have been upgraded to handle version 3.02 of the Tesseract OCR Engine. This upgrade includes a few small improvements to speed and accuracy of processing as well as an increased ability to use new data packages to support more extended character sets.  Additionally, here’s a list of all the langua...

Read More

Posted by Kevin Hulse on 02/27/2014 with 0 comments Tags: OCR, Tesseract

How to Work With Library Developers/Support


In addition to writing code from the ground up, we also work with other library developers and we package the libraries in a C# or Java API which is typically easier to work with or more convenient for our customers.  Many of our customers aren’t comfortable working with C++ libraries or sometimes the C++ libraries have awkward interfacing and that’s fine.  We’re very good at taking this type of API and presenting it in a way that feels right for .NET or the JVM and integrates with the rest of our code base. Still, the same way that you work with us, we have to, at times, work with other library writers and we find bugs every now and again.  Here are five tips for working with library creators to get the most out of your interactions. Make sure that you are using the library correctly. The reason is that (hopefully) your library has a model of operation that lends itself to a particular model of usage.  For example, some librarie...

Read More

Posted by Steve Hawley on 01/14/2014 with 0 comments

When is boolean not a boolean?


I ran into a failing C# unit test today, with the following output:   Expected: True   But was:  True Seriously. I stopped it in the debugger and the property that was being checked was “true”.  I set it to a local and that was also “true” in the debugger. So when is it the case that true != true? The answer, to me, was straight forward: in C#, “true” is supposed to be the 32-bit value 0x00000001, but in many languages, “true” is defined as “anything that is not ‘false’ (aka, 0). Since the code that was generating the value was C++/CLI interfacing with C, it seemed pretty clear where the issue was – I opened up a Memory window in the debugger and dropped the member onto it, which showed that the value for the boolean was 0x00002080 (or something like that).  The culprit was C++ code that was calling a low-level C function, passing in two locals by reference.  ...

Read More

Posted by Steve Hawley on 11/26/2013 with 0 comments
 |<  < 1 - 2 - 3 - 4 - 5 - 6  >  >| 

Syndication

Subscribe

Register to receive our monthly newsletter.
preload preload preload