Home : TesseractEngine - Overview
Q10363 - INFO: TesseractEngine - Overview

Deprecation NOTICE:

IN the DotImage world, TesseractEngine refers specifically to Tesseract version 2. (2.0.1 to be precise). This engine was deprecated in DotImage 11.0, and removed from 11.1 - please use Tesseract3Engine for our most recent Tesseract based option

Q104XX - INFO: Tesseract3Engine - Overview  (Coming soon)


The Tesseract Engine, class name TesseractEngine, is an open source engine that Atalasoft provides without charge for those who purchase the OCR Package. It is a commercial quality OCR engine originally developed at HP between 1985 and 1995. HP and UNLV open-sourced this engine in 2005.


The Tesseract engine is fast and runtime royalty free although it is not quite as powerful as the other engines supported by DotImage. In particular, it lacks segmentation and it is not very good at recognizing low quality documents.

Supported Languages

The TesseractEngine supports the following languages:

  • Dutch
  • English
  • French
  • German
  • Italian
  • Portuguese
  • Spanish

Supported Output Formatters

The TesseractEngine supports the following output formatters and provides a structure that allows you to build your own.

  • Text
  • PDF


The assemblies listed below are required for deployment.

  • Atalasoft.dotImate.OCR.Tesseract
  • Atalasoft.dotImage
  • Atalasoft.dotImage.OCR
  • Atalasoft.dotImage.Lib
  • System
  • System.Data
  • System.Drawing

Additionally, the Tesseract language files must be accessible. These are automatically placed in the DotImage directory during toolkit installation. When deploying, you must either copy the OcrResources to your application directory or tell the engine their location explicitly by passing it into the TerractEngine constructor. Please see the TesseractEngine class documentation for additional information.


The Tesseract Engine is used in exactly the same way as the other OCR engines, all of which inherit from the same base class, Atalasoft.dotImage.OCR.

Special Considerations

Once the Tesseract Engine is used, recognize is called with a language, you cannot change to an alternate language. The initialization happens the first time an document in recognized. Attempting to change the language an any time beyond that point results in an exception being thrown.

See Also

Related Articles
No Related Articles Available.

Article Attachments
No Attachments Available.

Related External Links
No Related Links Available.
Help us improve this article...
What did you think of this article?


Tell us why you rated the content this way. (optional)
Approved Comments...
No user comments available for this article.

Powered By InstantKB.NET v1.3
Copyright © 2002, 2020. InstantASP Ltd. All Rights Reserved