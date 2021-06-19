



Some brief notes on how to run OCR in Python using some popular engines and some quirks and tips Image by authorModern OCR systems

The OCR (Optical Character Recognition) system converts images containing valuable information (probably in text format) into machine-readable data. In most cases, performing OCR through several available means is the first step in extracting data from a paper or scan-based PDF document.

A quick search on the web will find lots of links to a variety of open source and commercial tools, but Google Vision and Tesseract as OCR engines have made a long start, especially against competitors in recent years.

Tesseract is an offline open source text recognition engine with a full-featured API that can be easily implemented in any business project via several wrapper modules for Python. pytesseract is one example.

On the contrary, Google Vision runs on a remote Google server rather than locally. To get started with the Google Vision API in your project, you need to perform some configuration steps, such as providing valid credentials according to the official guide. In addition, you may be charged for text recognition requests that exceed the limit, as described in Google’s pricing policy.

Despite the fundamental differences in usage and set of options, both tools have virtually the same interests for web users, judging by Google Trends.

From now on, I’ll run OCR in Python and accidentally compare performance on real images (recreated or scanned by the author to mimic different initial quality documents) on both engines.

