DotImage OCR Tesseract Engine
An add-on with runtime royalty-free options
The DotImage OCR Tesseract Engine is a desktop royalty-free OCR toolkit based on Google's open-source Tesseract OCR. This low cost option for intelligent document capture integrates seamlessly into DotImage's OCR interface.
The tesseract engine is based on a natural learning algorithm, that accurately recognizes characters in a scanned document. By integrating Tesseract into the clean .NET Interface of DotImage, it enables .NET developers to easily add OCR and Searchable PDF into their desktop or server applications without the need to track and pay for each desktop deployment.
- Support for the languages Dutch, English, French, German, Italian, Portuguese, and Spanish
- Ability to determine character, word, and line size and location
- Reports confidence of each recognized character
- Output to Text or Searchable PDF
- Royalty Free Desktop Licensing
Using the Tesseract engine to extract text from a document can be very simple and easy:
Atalasoft.Ocr.Tesseract.TesseractEngine engine = new Atalasoft.Ocr.Tesseract.TesseractEngine();
string mimeTypes = engine.SupportedMimeTypes();
string selectedMimeType = "text/plain";
engine.Translate(myImageSource, selectedMimeType, @"C:\output.txt");
By handling the engine's events, you can clean images to improve the recognition. This article can provide more information:
Improve OCR Color Images' Read Quality
"Thin client features such as navigation, zooming, and image zoning coupled with OCR have leveaged imaging into a more active part of our abstraction process.
When selecting technology to leveage in our workflow application, we looking at imaging apart from technology, selecting the product that was the best solution and most configurable.
We're a Java shop and committed to that technology, but chose Atalasoft DotImage even though it's a .NET product."
- Joe Aparo, Senior Developer, EBSCO