<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Atalasoft Knowledgebase : OCR : Tesseract</title><description>Atalasoft Knowledgebase : OCR : Tesseract RSS 2.0 Feed</description><link>http://www.atalasoft.com/KB/</link><webMaster>admin@atalasoft.com</webMaster><lastBuildDate>Wed, 19 Jun 2013 20:56:21 GMT</lastBuildDate><ttl>20</ttl><generator>Atalasoft Knowledgebase</generator><item><title>TesseractEngine - Overview</title><link>http://www.atalasoft.com/KB/article.aspx?id=10363</link><description>&lt;B&gt;Abstract:&lt;/B&gt; &lt;H2&gt;TesseractEngine&lt;/H2&gt;&lt;P&gt;The Tesseract Engine, class name &lt;FONT face="Lucida Sans Typewriter"&gt;TesseractEngine&lt;/FONT&gt;, is an open source engine that Atalasoft provides without charge for those who purchase the OCR Package. It is a commercial quality OCR engine originally developed at HP between 1985 and 1995. HP and UNLV open-sourced this engine in 2005.&lt;/P&gt;&lt;H2&gt;Features&lt;/H2&gt;&lt;P&gt;The Tesseract engine is fast and runtime royalty free although it is not quite as powerful as the other engines supported by DotImage. In particular, it lacks segmentation and it is not very good at recognizing low quality documents.&lt;/P&gt;&lt;H2&gt;Supported Languages&lt;/H2&gt;&lt;P&gt;The &lt;FONT face="Lucida Sans Typewriter"&gt;TesseractEngine&lt;/FONT&gt; supports the following languages:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Dutch &lt;LI&gt;English &lt;LI&gt;French &lt;LI&gt;German &lt;LI&gt;Italian &lt;LI&gt;Portuguese &lt;LI&gt;Spanish &lt;/LI&gt;&lt;/UL&gt;&lt;H2&gt;Supported Output Formatters&lt;/H2&gt;&lt;P&gt;The &lt;FONT face="Lucida Sans Typewriter"&gt;TesseractEngine&lt;/FONT&gt; supports the following output formatters and provides a structure that allows you to build your own.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Text &lt;LI&gt;PDF &lt;/LI&gt;&lt;/UL&gt;&lt;H2&gt;Deployment&lt;/H2&gt;&lt;P&gt;The assemblies listed below are required for deployment.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;Atalasoft.dotImate.OCR.Tesseract&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;Atalasoft.dotImage&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;Atalasoft.dotImage.OCR&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;Atalasoft.dotImage.Lib&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;System&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;System.Data&lt;/FONT&gt; &lt;LI&gt;&lt;FONT face="Lucida Sans Typewriter"&gt;System.Drawing&lt;/FONT&gt; &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Additionally, the Tesseract language files must be accessible. These are automatically placed in the DotImage directory during toolkit installation. When deploying, you must either copy the OcrResources to your application directory or tell the engine their location explicitly by passing it into th</description><pubDate>Thu, 06 Dec 2012 04:04:00 GMT</pubDate><dc:creator>Robin Sale</dc:creator></item><item><title>Deploy a project using OCR</title><link>http://www.atalasoft.com/KB/article.aspx?id=10141</link><description>&lt;B&gt;Abstract:&lt;/B&gt; &lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Note: &lt;/STRONG&gt;This information is specific to DotImage at the time it was written and may change slightly in future versions.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Every OcrEngine has different requirements in terms of how it is deployed.  Atalasoft has tried to formalize this process as much as possible as well as to provide guidelines on the mechanism for deployment.  Licensing is covered in another topic.  This topic covers how to ensure that an OcrEngine will be able to start and will be able to find its own resources.&lt;/P&gt;&lt;P&gt;In your SDK installation, you will find a folder named "OcrResources".  This folder is the general folder for all supported OCR engines.  Within it you will see a structure like this:&lt;/P&gt;&lt;BLOCKQUOTE style="MARGIN-RIGHT: 0px" dir=ltr&gt;&lt;P&gt;EngineManufacturer1&lt;/P&gt;&lt;P&gt;                &amp;lt;Files&amp;gt;&lt;/P&gt;&lt;P dir=ltr&gt;EngineManufacturer2&lt;/P&gt;&lt;BLOCKQUOTE style="MARGIN-RIGHT: 0px" dir=ltr&gt;&lt;BLOCKQUOTE style="MARGIN-RIGHT: 0px" dir=ltr&gt;&lt;P dir=ltr&gt;&amp;lt;Files&amp;gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;&lt;P dir=ltr&gt;...&lt;/P&gt;&lt;P dir=ltr&gt;EngineManufacturer&lt;EM&gt;N&lt;/EM&gt;&lt;/P&gt;&lt;P dir=ltr&gt;...&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P dir=ltr&gt;In general, most of the handling of loading and locating resources is managed by Atalasoft or by the engine itself and does not require work by the client, but in custom situations, there may be work to be done by the client to handle this.&lt;/P&gt;&lt;P dir=ltr&gt;To sort this out, let's start with a few definitions:&lt;/P&gt;&lt;P dir=ltr&gt;&lt;STRONG&gt;engine resources folder&lt;/STRONG&gt; - this is the folder which contains the OCR Engine's resource files&lt;BR&gt;&lt;STRONG&gt;OCR resources folder&lt;/STRONG&gt; - the top level folder of all OCR Engine resources, called "OcrResources"&lt;BR&gt;&lt;STRONG&gt;application folder&lt;/STRONG&gt; - the folder where your application is installed&lt;BR&gt;&lt;STRONG&gt;assembly folder&lt;/STRONG&gt; - the folder which contains the dotImage assembly files (ie, Atalasoft.dotImage.Ocr.dll), this ma</description><pubDate>Tue, 03 Jul 2012 09:55:00 GMT</pubDate><dc:creator>Robin Sale</dc:creator></item></channel></rss>