Atalasoft
Welcome to Atalasoft Community Sign in | Join | Help
in

Tesseract OCR Performance

Last post 04 Apr 2008, 4:02 PM by RickM. 5 replies.
Sort Posts: Previous Next
  •  11 Mar 2008, 9:19 AM 13453

    Tesseract OCR Performance

    We have been working on a solution using the GlyphReader OCR engine and out of curiousity recently attempted to replace it with the new Atalasoft's Tesseract implementation in 6.0.  We didn't spend too much time testing it out but at first blush it seemed far far slower than Glyphreader and less accurate.  Has anyone else experienced this performance (or lackthereof) from tesseract?

     I wonder how Google gets it to perform if that is indeed what they are using.

  •  12 Mar 2008, 4:54 PM 13469 in reply to 13453

    Re: Tesseract OCR Performance

    Tesseract performance is directly related to the size of the images fed into it. Try experimenting with reducing the size of your images. You may find that you can speed things up significantly with no reduction in accuracy.
  •  21 Mar 2008, 2:35 PM 13520 in reply to 13469

    Re: Tesseract OCR Performance

    When you say reduce the size - do you mean reduce the dimensions? And also, will this also work for GlyphReader - i.e. is this trick going to improve its performance or does it do this internally therefore reducing the image size is not going improve performance much more?

  •  24 Mar 2008, 11:36 AM 13538 in reply to 13520

    Re: Tesseract OCR Performance

    Yes, I do mean the image dimensions.

    The actual performance in glyphreader or tesseract depends a lot on the particular images you are using. Your best bet is to make some test samples of different sizes and load them up in one of our OCR demos.

  •  02 Apr 2008, 11:45 AM 13589 in reply to 13538

    Re: Tesseract OCR Performance

    If I want to reduce the image size is the ResampleDocumentCommand the way to go - with an appropriate method - or is there another recommended way?

    Thanks.

  •  04 Apr 2008, 4:02 PM 13608 in reply to 13589

    Re: Tesseract OCR Performance

    ResampleDocumentCommand is indeed the best way to go about resizing your image for ocr. I recommend setting the DocumentMethod property to ResampleDocumentMethod AreaAverage for the fastest results.

View as RSS news feed in XML