OCR gets paper into the digital workflow

The follow-up version to ScanSoft Inc.'s TextBridge Pro 9.0 Business Edition was expected to boost accuracy, maintain page formatting better and close in on the fine capabilities of its closest competitor, Caere Corp.'s OmniPage 10 [see review, FCW, Nov. 1, 1999].

Since that evaluation, the optical character recognition landscape has changed considerably. Notably, ScanSoft Inc. acquired Caere Corp. this year, leaving the resulting company with two complementary products:

* OmniPage appeals to agencies that need to precisely maintain the format of original pages and perform ancillary asks, such as making fill-in forms.

* The new TextBridge Pro Millennium Business Edition has been refitted to better handle large-scale conversion of documents while providing decent recognition accuracy.

As with past versions, the Millennium Business Edition is appropriate for novice and expert users. The uncluttered interface puts all the major functions within a single mouse click: scanning pages, recognizing content and saving the converted material in multiple digital formats.

Still, there are many functional improvements and performance updates that would justify an upgrade or a new purchase. For example, improved memory management lets users specify a temporary disk file to stage pages while awaiting recognition, a feature that enables users to process documents containing a few hundred pages without having to start and stop several times. You can also schedule the processing of very large jobs to occur at off hours.

Also, it's easy to manipulate scanned pages using the thumbnail pane. Using the pane, you can select a page image and change zones (areas that are recognized), and rearrange the processing order of pages. You can even scan side-by-side pages of a book at the same time, yet recognize them as two separate pages.

Previously, scanning could be invoked from your word processor. Now, a new Instant Access feature works with Microsoft Corp.'s FrontPage 2000 HTML Web editor and Print Shop ProPublisher 2000, plus most Windows text programs. It saves time, for example, having converted text automatically appear in a Web page while using FrontPage.

Our various accuracy measurements (which include recognition errors per page and format retention) still put TextBridge Pro slightly behind OmniPage 10, but not by much. Credit several Millennium Edition internal enhancements for the better showing. For instance, the algorithm to recognize tables was redone, so there weren't as many misplaced cells or stray lines as before. Moreover, you can quickly edit the entire recognized table or individual cells from within TextBridge.

TextBridge Pro's recognition engine borrows a few tricks from OmniPage, which should interest any agency working internationally. In all, 56 languages are accepted. The software also understands multiple languages on the same page if they belong to the same language group.

Document recomposition, which means that the original page layout is maintained, has improved, although it's still not as good as OmniPage Pro 10. When I saved documents in Microsoft Word format, TextBridge reproduced multiple columns and generally kept color pictures in the same location as the original. However, type size and style was misread several times — even when scanning high-quality original documents.

That said, TextBridge's conversion of documents to Adobe Systems Inc.'s Acrobat format is one of the best we've seen. Version 9.0 did an excellent job of saving recognized pages as Portable Document Format files, and this update is even better. The Millennium Edition compressed files up to 27 percent smaller than Version 9.0 did, and the quality of images was noticeably improved. This ability to efficiently convert large amounts of pa-per into PDF formats should be especially advantageous to government entities.

Overall, we'd recommend TextBridge Pro Millennium Business Edition for general government use. Although it's not the most accurate converting certain complex documents, the PDF output is among the best.

Mike Heck is an InfoWorld contributing editor and manager of electronic promotions at Unisys Corp. in Blue Bell, Pa.


  • Defense
    Ryan D. McCarthy being sworn in as Army Secretary Oct. 10, 2019. (Photo credit: Sgt. Dana Clarke/U.S. Army)

    Army wants to spend nearly $1B on cloud, data by 2025

    Army Secretary Ryan McCarthy said lack of funding or a potential delay in the JEDI cloud bid "strikes to the heart of our concern."

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

Stay Connected


Sign up for our newsletter.

I agree to this site's Privacy Policy.