OCR gets paper into the digital workflow

ScanSoft Inc.'s TextBridge Pro 9.0 Business Edition provides fast document conversion and output to Web formats

The follow-up version to ScanSoft Inc.'s TextBridge Pro 9.0 Business Edition was expected to boost accuracy, maintain page formatting better and close in on the fine capabilities of its closest competitor, Caere Corp.'s OmniPage 10 [see review, FCW, Nov. 1, 1999].

Since that evaluation, the optical character recognition landscape has changed considerably. Notably, ScanSoft Inc. acquired Caere Corp. this year, leaving the resulting company with two complementary products:

* OmniPage appeals to agencies that need to precisely maintain the format of original pages and perform ancillary asks, such as making fill-in forms.

* The new TextBridge Pro Millennium Business Edition has been refitted to better handle large-scale conversion of documents while providing decent recognition accuracy.

As with past versions, the Millennium Business Edition is appropriate for novice and expert users. The uncluttered interface puts all the major functions within a single mouse click: scanning pages, recognizing content and saving the converted material in multiple digital formats.

Still, there are many functional improvements and performance updates that would justify an upgrade or a new purchase. For example, improved memory management lets users specify a temporary disk file to stage pages while awaiting recognition, a feature that enables users to process documents containing a few hundred pages without having to start and stop several times. You can also schedule the processing of very large jobs to occur at off hours.

Also, it's easy to manipulate scanned pages using the thumbnail pane. Using the pane, you can select a page image and change zones (areas that are recognized), and rearrange the processing order of pages. You can even scan side-by-side pages of a book at the same time, yet recognize them as two separate pages.

Previously, scanning could be invoked from your word processor. Now, a new Instant Access feature works with Microsoft Corp.'s FrontPage 2000 HTML Web editor and Print Shop ProPublisher 2000, plus most Windows text programs. It saves time, for example, having converted text automatically appear in a Web page while using FrontPage.

Our various accuracy measurements (which include recognition errors per page and format retention) still put TextBridge Pro slightly behind OmniPage 10, but not by much. Credit several Millennium Edition internal enhancements for the better showing. For instance, the algorithm to recognize tables was redone, so there weren't as many misplaced cells or stray lines as before. Moreover, you can quickly edit the entire recognized table or individual cells from within TextBridge.

TextBridge Pro's recognition engine borrows a few tricks from OmniPage, which should interest any agency working internationally. In all, 56 languages are accepted. The software also understands multiple languages on the same page if they belong to the same language group.

Document recomposition, which means that the original page layout is maintained, has improved, although it's still not as good as OmniPage Pro 10. When I saved documents in Microsoft Word format, TextBridge reproduced multiple columns and generally kept color pictures in the same location as the original. However, type size and style was misread several times — even when scanning high-quality original documents.

That said, TextBridge's conversion of documents to Adobe Systems Inc.'s Acrobat format is one of the best we've seen. Version 9.0 did an excellent job of saving recognized pages as Portable Document Format files, and this update is even better. The Millennium Edition compressed files up to 27 percent smaller than Version 9.0 did, and the quality of images was noticeably improved. This ability to efficiently convert large amounts of pa-per into PDF formats should be especially advantageous to government entities.

Overall, we'd recommend TextBridge Pro Millennium Business Edition for general government use. Although it's not the most accurate converting certain complex documents, the PDF output is among the best.

Mike Heck is an InfoWorld contributing editor and manager of electronic promotions at Unisys Corp. in Blue Bell, Pa.

NEXT STORY: Portal offers feds free Web access