Desktop OCR market serves up two strong contenders
After years of widespread competition, the desktop optical character recognition (OCR) market has come to be dominated by two products: Caere Corp.'s OmniPage and ScanSoft Inc.'s Text-Bridge. Most recently, the options available to government agencies and departments have been winnowed by Caere's acquisition of two major competitors (Calera Recognition Systems Inc., developer of WordScan, and Recognita Corp., maker of Recognita Plus).
Fortunately for consumers, this wasn't simply a case of Caere buying off the competition. Instead of disposing of the other products, Caere incorporated parts of the products' OCR technologies into OmniPage Pro 10. The end result is a product with impressive recognition accuracy. Although TextBridge lags a tad behind in accuracy, the program offers an extremely easy-to-use interface that is well suited to novice users.
To test OmniPage and TextBridge, we scanned the same set of 50 typewritten, magazine and spreadsheet pages. In each conversion, we looked for not only the accuracy with which the program translated text, but also for the fidelity with which the program could reproduce formatting. In addition, we scored the programs according to their ease of use and flexibility in handling different types of hard copy.
OmniPage emerged as the decided winner, but TextBridge was a strong contender that may be better suited to agencies or departments with novice users.
Caere's OmniPage Pro
Thanks in part to the recent incorporation of newly acquired technologies from Calera and Recognita, Caere's OmniPage Pro 10 has become the standard to beat in desktop OCR. The program's accuracy in our tests with Microsoft Corp. Word documents was almost perfect (making, for example, only two mistakes on an 800-cell spreadsheet). Overall, OmniPage turned in an accuracy rate of more than 99 percent on our tests. Moreover, OmniPage is especially strong at handling degraded pages, such as faxes that may not have come through clearly. Additionally, OmniPage did a fine job maintaining elements of original pages, including font characteristics and sizes, column layout and color graphics.
OmniPage's new interface has a set of tabs that lets users easily choose from three processing modes: AutoOCR, manual or OCR wizard. Using the automatic mode, we merely clicked the start button and OmniPage scanned and recognized pages using preset options. Furthermore, zoning (specifying areas on a page containing text to be recognized) and OCR now occur in a single step, which speeds the entire process. On an Intel Corp. 300 MHz Pentium II PC, a page was scanned and recognized in an average of 20 seconds. The manual mode enabled us to draw zones and choose options, such as whether the proofing step was invoked. We particularly liked the ability to change preferences at any time during the OCR process. For example, you can redraw zones on a page and recognize the text again without rescanning the document, which speeds work. Furthermore, we set the software to automatically scan documents at regular intervals (such as every 30 seconds) from our flatbed scanner.
The improved OCR proofreader now includes five zoom levels, making it easier to compare recognized pages against the original document image. Moreover, a new voice read-back feature spoke the converted document as we read along from the original. For spreadsheets or other numeric material, that feature was indispensable for verifying recognition results.
OmniPage's identification of tables and other layout characteristics (such as fonts) was superb. Still, for those few misses, it's now easier to correct errors immediately. For example, the new Table Editing Window lets you revise the contents of cells before saving the document as a word processing or spreadsheet file. Finally, the OmniPage Pro package includes a personal edition of OmniPage Web, which converts multiple-page documents (up to 10 pages) into hyperlinked World Wide Web sites.
The bottom line: The strides Caere has made in improving OCR accuracy and OmniPage's usability, along with the low upgrade price, make the program an excellent choice.
ScanSoft's TextBridge Pro 9.0 Business Edition
TextBridge Pro 9.0 Business Edition, introduced in June 1999, runs nose to nose with the previous release of OmniPage (9.0) in features. Moreover, TextBridge held up well in our testing—matching the accuracy of OmniPage 10 on some pages.
TextBridge software is extremely simple to use, a benefit to departments with novice users. We had no trouble understanding the cleanly laid out user interface, and scanning and recognition is simple. Configurable icons enable you to Get Pages, Recognize and Send finished text to another application. Similarly, tools for managing difficult documents will not confound casual users, but they offer the control experienced workers demand. For example, to get the best scan of our color magazine pages, we simply clicked the Page Type button and picked the desired icon—there was no need to delve into complex dialogs.
TextBridge Pro 9.0 captured and maintained color and grayscale images in our converted documents. Furthermore, we easily zoned oblong and L-shaped regions then split and merged zones—essential tasks when converting precise area of pages.
Overall, recognition results were very good, with the program turning in an accuracy rate of just less than 99 percent on our tests. The software detected text on a tinted background and also correctly recognized reverse type, columns and drop caps. However, TextBridge Pro had a bit of difficulty maintaining text attributes—sometimes substituting plain text for boldface or rendering text as italic when the original was normal. TextBridge adeptly handled tables, which are often part of agency reports. Furthermore, the software analyzed the scanned page and let us add, move or remove cell lines before recognition occurred.
As you proofread documents, part of the original scan appears within the toolbar for comparison. But you can't zoom in on the image or view the recognized text alongside the original page. Even so, TextBridge lists alternate spelling suggestions, which accelerates the proofing process.
Perhaps the biggest draw of this version, though, centers on four Portable Document Format output options. With one option, when TextBridge encountered a suspect word, it substituted the actual scanned image in the finished Adobe Systems Inc. Acrobat file to maintain the document's integrity. Other PDF options enable you to save converted files with no word images or, conversely, render the entire page as an image. In fact, some agencies use this latter feature exclusively, bypassing the OCR step. Pages exported to HTML closely resembled originals, but TextBridge doesn't have the Web site building capability found in OmniPage.
In all, government users faced with going from paper to digital documents will benefit from TextBridge's simple interface design along with its formatted Web page output and PDF creation capabilities.
-- Heck is an InfoWorld contributing editor and manager of electronic promotions at Unisys Corp., Blue Bell, Pa. He can be reached at [email protected]
OmniPage Pro 10
Price and Availability
OmniPage Pro 10 is available for $99 as an upgrade to owners of any OCR software—or for $499 to new users. It is available through General Services Administration resellers, including BTG Inc., Comark Government & Education Sales Inc., CompUSA, Government Micro Resources Inc. and GTSI.
This latest upgrade offers a refreshed OCR engine for better accuracy, improved formatting commands, plus greater ease of use with features such as a new interface and voice readback. Fast, reliable conversion of paper documents, especially complex layouts and spreadsheets, mean workers spend less time recreating documents electronically and therefore are more productive.
TextBridge Pro 9.0 Business Edition
Price and Availability
TextBridge Pro 9.0 Business Edition sells on the open market for $499. It is available directly from ScanSoft and through electronic resellers Outpost.com and Softmart. Government customers receive volume discounts.
Agencies tasked with converting numerous paper documents to Adobe's Portable Document Format or Hypertext Markup Language for the World Wide Web will benefit from TextBridge Pro's multiple output options. In addition, this version reliably saves electronic files in standard desktop formats, such as Microsoft Word. Recognition accuracy and page layout retention are very good, so converted documents should require minimal retyping and reformatting.