Optical Character Recognition may not ring a bell too many; the acronym OCR has more chances of doing that, but still manages to raise a few eyebrows. However, there are plenty of users that require character recognition software in order to create editable digital copies of physical documents. The scenario is simple: scan a document and then shove the digital result into an OCR program to detect the characters and make the piece editable.

The success of the entire procedure relies solely in the OCR software’s ability to accurately detect the letters. Professional solutions jam-packed with useful features and extraordinary results are present on the market at substantial costs, of course. However, cheaper alternatives equipped only with a minimum set of tools can be employed for less complicated jobs.

PDF OCR from PDFZilla is no complicated program that comes free of charge. It features only the necessary options. The interface is intuitive enough to allow even the noobest of all noobs to get by without having to ask for directions. Everything is available in plain view and within easy reach. The feature rack comprises navigation buttons that allow you to browse the document page by page or move straight to the first/last one. Additional helpers include zoom options for better viewing of the document (there is also a button that shows the document at actual size).

The application can be used for reading PDF files, as it features a side panel, which displays all the pages of the opened document, letting you move comfortably to a certain part of the text. Although not rich in options, PDF OCR provides a simple way to read your PDFs.

The character recognition part of the application, which makes for the purpose of the software, is easy to handle and brings absolutely no configuration settings. However, you have some options to pick from and refer to the pages that should be checked. There is some flexibility in this regard, as you can define the page range you want to convert to editable format, only the selected page or set the app to go through all the “sheets.”

An advantage of PDF OCR, besides the fact that it is absolutely free of charge, is that it’s not restricted to English text. The set of languages supported is mainly extended to French, German, Italian, Dutch, Spanish, Portuguese and Basque. If your text is not in one of these languages, you can try the “Other Languages” set, although we have not made tests for it.

Once the text you want to save in editable format is loaded and you have defined the pages you want to convert, you can start the operation. While the task is being run, you can view completion progress and current status (page progress included). As far as the resources used for the operation are concerned, they do not exactly compliment the software as CPU usage is intense, 50% being the lowest value recorded during our tests; moreover, in our case, the application required above 85% of the CPU on a constant basis. For larger projects, it will definitely impact system performance negatively, but for those having few pages, it will finish quite quickly.

The results of the test fit our expectancies, as they were not incredible, but not disappointing either. Not everything rolled out to a smooth result, although satisfactory. A handful of tips such as the fact that text on colored background would only impede on accurate character detection, or that easily intelligible fonts increase the quality of the end result, would have been great to be included in a help file of some sort.

Moving over to the editing window, we found the options toned down to a minimum, too. Only the basic has been included in this part of the software, but text formatting (bold, italic and underline), alignment options (right, justify and left), or font choices have not been omitted. Also, the developer included search and replace options. This way, you can change whatever characters you want automatically.

Based on our tests, PDF OCR will not perform miracles with detecting characters, but provided that the text is clean and there are no elements hindering the detection of the characters, the app will do a pretty good job. However, this does not mean the result won’t have to be proofread. In the case of a Spanish text, for instance, although the document was extremely clean, PDF OCR still mistook the letter “M” in uppercase for two characters: “|” and “V.” Even more than this, in plenty of cases, the lowercase “n” was taken as “m” or comma would be seen as the full stop sign. With English texts, there is less hassle as accuracy is higher, but the same tips fit perfectly for increased quality of the resulting document.

The conclusion is that PDF OCR will ease your work but only if the scanned document is clear enough for accurate interpretation of the characters. It is extremely easy to use and allows you to define the page range you want to turn to editable text. Resources employed during brief projects can be overlooked, but not when you’re dealing with larger jobs.

The Good

PDF OCR comes with an intuitive interface that does not create any confusion among users. The basic set of options it provides is in plain view. It can also be used as a PDF reader as it permits you to browse through the document or adjust zoom level for a better view.

The Bad

The resources used for converting the PDF to editable text are not to be neglected at all. Combined, the two processes of the application can require even more than 80% of your CPU. Also, character recognition gets close to flawless only if certain conditions regarding the quality of the PDF are respected.

The Truth

For a free piece of software, OCR does a good job if the initial PDF file is a standard one, without colored text or background or weird fonts. Even if resource usage is pretty high, the aspect can be overlooked for small projects.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>