OCR/ICR: How It's Used and How to Build Applications to Read Text

Today's digital document libraries need to be searchable and office workers need to be able to index and pull data from within these documents. Traditionally this is done with an office worker keying in the documents contents. This is unfortunately slow and expensive when compared to a computer that could do the same task. However computers are not yet perfect at reading text off a scanned document, but any document that can be programmatically transformed into raw text data saves the business time and money.

Optical Character Recognition (OCR) is a method by which software "reads" the text characters to preform text recognition from an otherwise flat, scanned image. The resulting text can be placed anywhere programmatically and is necessary in larger document workflows and for discoverability.

Intelligent Character Recognition (ICR) follows the same software concept, but is tuned to recognize handwriting rather than printed text. Both of these technologies require a software engine that powers the processing to turn images into useful data.

Atalasoft's Document Transformation Engines

Atalasoft provides a set of developer components to build applications with built-in, industry proven, document transformation engines. The tools available can save countless hours of configuring smart algorithms and provide a familiar, customizable experience for the development team. The toolkits that help with OCR are:


  • .NET Platform
  • Core Document Imaging SDK
  • Provides document cleanup functions
  • Provides class structure and document streaming to link to a custom or favorite OCR engine.
  • Supports our OCR add-ons for:
    • Google Tesseract Engine
    • ABBYY 
    • Glyphreader Engine

 WingScan with VRS

  • .NET with a JavaScript browser control
  • Captures documents through a website from a scanner
  • Provides an interface to Kofax's Virtual Re-Scan(VRS) technology providing images cleaned up, and correctly oriented, and otherwise optimized for OCR and ICR engines to extract the most data.

