Simple Zonal OCR
|
Use OCR Text to Name a File
Simple Zonal OCR is what its name implies, a simple to
setup and use program that will OCR an area of a document and then use
the OCR text to move and rename the file.
Ideal Uses
Ideal uses for this product are the automatic filing of internal
documents that contain a number, such as Work Orders, Shipping
Documents, and Delivery Tickets etc. Virtually any document that
maintains formatting and cannot have a barcode put on it is an ideal
candidate for this program. Although more than numbers can be read,
numbers can be validated and Fuzzy Logic can be applied for better
results. |
Capture Two Values
Usually only one value is captured, however the program can be setup to read
two values so the user can create a simple filing system with it. For instance,
if filing Work Orders the customer’s number and the work order number can be
captured. The document can then be placed in the customer’s folder and named the
work order number.
Fuzzy Logic
The Fuzzy Logic used in Simple Zonal OCR was used in a custom application
created by eDocfile. The results were audited by an independent firm and it was
found that 1 out of a thousand documents failed and had to be manually indexed.
Fuzzy logic assumes that the same OCR mistakes will always be made on the same
documents. For instance with one font the letter “O” maybe returned when it
should have been a “0” or perhaps a “|”, “/”, “\”, I, instead of a “1”. Fuzzy
logic swaps out the character to what it should be. The Fuzzy Logic is user
configurable, in other words, the program does not know what mistakes are always
going to be made, it is up to the user to discover them and set the program so
that they are not repeated.
Ideal uses for this product are the automatic filing of internal
documents that contain a number, such as Work Orders, Shipping
Documents, Delivery Tickets etc.
It utilizes the OCR engine found in Microsoft Office
Document Imaging (MODI), This engine is one of the best in the world and
it is part of the Microsoft Office Suite.
Choice of OCR engines
It utilizes the OCR engine found in Microsoft Office Document Imaging (MODI),
this engine is one of the best in the world and it is part of the Microsoft
Office 2003 and 2007 Suite. If Office is not available it can also use the award
winning Tesseract engine. Both engines are very good and results will vary
depending upon the engine being used. For instance on standard sized text with a
common font Tesseract maybe better than MODI, but on a large unusual font MODI
may perform better.
Batch Scanning with Blank Page Separation
Batch scanning can be done with Simple Zonal OCR as the program can use blank
page separators. To use it when preparing documents the user places a blank page
in-between each file before it is scanned. When the scanned image is processed
the software will search for the blank page and when one is found it will be
dropped and a new document will be created.
File Validation
Simple Zonal OCR can apply an EasyPattern to validate the OCR text.
EasyPatterns are simple to setup and use and work like regular expressions. What
they do is check for a pattern. For instance, it could check to see if only
numbers were returned and only so many, perhaps 6 digits “[6digits]”. Or perhaps
a 6 digit number that begins with 42 “42[4digits]” should be used to validate
the OCR. If it fails validation the file is moved to a folder for manual
processing.
Manual Processing of Failed Files
In a perfect world there would be no mistakes, however it is not a perfect
world and there is no OCR engine that does not make mistakes. The key is to find
the mistakes (Validation) and then quickly correct them. Simple Zonal OCR places
the files that fail in a separate folder. There is a viewer included with the
program that allows the user to quickly view the file and enter the correct
information. Once entered, the file is closed and processed and the next file is
automatically opened allowing for a user to quickly cycle through the failed
files.
Features:
|