Home : AbbyyEngine Ignores Some Content (such as tables)
Q10456 - PRB: AbbyyEngine Ignores Some Content (such as tables)

Atalasoft's implementation of AbbyyEngine OCR engine is tuned to try and avoid getting caught up in particularly "busy" images (when asked to OCR a document that contains not just text but pictures/images as well etc...

However, there are times when you have content you want to force it to read.

The fix is to use the PageLocation event to override AbbyyEngine

// creating new AbbyyEngine instance
AbbyyEngine engine = new AbbyyEngine();

// BEFORE calling engine.Initialize(), set up this handler
engine.PageLocation += engine_PageLocation;

// set any other pre-initialization code here

engine.Initialize();


// NOW you can use the AbbyyEngine as usual
... code here ...

// cleanup when done
engine.ShutDown();
engine.Dispose();


/// this is the engine_PageLocation handler that will convert OcrImageRegion regions into
/// AbbyyOcrTextRegion objects and allow recognition within those formerly ignored regions
private void engine_PageLocation(object sender, OcrPageLocationEventArgs e)
{
    e.RegionsOut = new OcrRegionCollection();
    foreach (OcrRegion ocrRegion in e.RegionsIn)
    {
        e.RegionsOut.Add(ocrRegion is OcrImageRegion
            ? new AbbyyOcrTextRegion(ocrRegion.PolygonBounds)
            : ocrRegion);
    }
}

Related Articles
No Related Articles Available.

Article Attachments
No Attachments Available.

Related External Links
No Related Links Available.
Help us improve this article...
What did you think of this article?

poor 
1
2
3
4
5
6
7
8
9
10

 excellent
Tell us why you rated the content this way. (optional)
 
Approved Comments...
No user comments available for this article.

Powered By InstantKB.NET v1.3
Copyright © 2002, 2019. InstantASP Ltd. All Rights Reserved