Document conversion I: Adobe quickens Capture
- By Patrick Marshall
- Apr 10, 2000
Adobe Systems Inc.'s Acrobat Capture is a program with a very clearly defined
mission: scanning hard-copy documents and converting them into searchable
Acrobat Portable Document Format files that can be made accessible to users
on a local-area network, the Internet or other media.
Unlike ZyLab International Inc.'s ZyImage, the other program we tested
for this review of scanning/conversion programs (see Page 40), Capture doesn't
allow you to include existing document files in the process, nor does it
offer the breadth of search tools and output choices that ZyImage delivers.
But the functionality that Capture does provide is crafted into an extremely
easy-to-use and highly scalable package. With the just-released Version
3.0 of the program, Adobe has transformed Capture into a tool suitable for
larger enterprises that have heavy loads of documents to move to the World
Wide Web or other electronic media.
In fact, Acrobat Capture is available in two versions. The Personal
Edition has a limit of converting 20,000 pages, after which the right to
convert additional pages can be purchased.
The new Cluster Edition, which we tested, not only comes with no page
limit but also offers load balancing across multiple workstations. The software
also supports dual- and quad-processor systems. Finally, administrators
can dole out various processing jobs to specified workstations. You might,
for example, route all documents to a designated staff person for checking
the character recognition results.
Installing Capture on multiple Windows NT workstations was simple. (The
program is not yet designated as Windows 2000 compliant, although I ran
Capture on a Windows 2000 server and workstation without any problems.)
To get workstations working together on projects, simply click on Station/Join
Workgroup on Capture's main menu, then locate the primary Capture workstation.
Capture's new interface offers a detailed view of all the pieces of a conversion
workflow. The panel on the left has four tabs: Configure, Scan, Submit and
Watch. Clicking on the Configure tab will display all the available workflows.
Creating new workflows, as well as editing existing ones, is easy, thanks
to a generous set of provided templates. You can drag and drop steps from
one workflow to another, and customizing each step is straightforward.
The Scan tab provides access to scanner configuration tools, and you'll
find that Capture supports a wide range of scanners. The Submit tab opens
a dialog box that enables you to specify files for processing. Finally,
the Watch tab allows you to set up folders to be monitored for new files
to be processed. After you've set up directories to be watched, the directories
will be checked periodically. As new files are found, the specified workflow
will be executed automatically.
Once you've set up the workgroup, it's easy to specify what steps in a workflow — from scanning to character recognition to exporting — can be performed
on which workstations.
There are a few things to keep in mind when setting up workgroups. For
starters, if your workgroup contains more than 10 workstations, you will
have to locate the workgroup hub on a file server or a workstation running
Windows NT Server because Windows NT Workstation has a limit of 10 connections.
And if you're running heavy workloads, you'll have to experiment to see
whether it's more effective to move work steps to a separate workstation
or to run them on a multiprocessor system.
Unlike ZyImage, Capture makes it very easy to check on the accuracy of the
program's OCR. This new version of Capture improves on the earlier toolset
with the QuickFix utility, which provides an excellent set of tools for
checking and repairing suspect words. You can make submission of suspect
words to QuickFix part of any workflow, in which case the workflow will
pause at the appropriate stage for an editor to complete the checking procedure.
QuickFix offers an effective interface for document checking. Suspects
are presented in a table, with the first column displaying the image of
the word. The second column shows the suggested spelling. It gives you the
option of accepting the suggestion, deleting the word or editing it.
QuickFix also offers some unexpected flexibility in that it allows you to
sort the suspect entries alphabetically, by their order of appearance, by
degree of confidence or by "reason."
Capture's OCR engine is accurate enough, however, that you'll rarely
need to make corrections. We found that the program did an excellent job
of zoning and recognizing text, even on complex pages loaded with graphics.
If your documents are in good shape, you may never need QuickFix, although
the utility does come in handy if you're importing tattered, faded or otherwise
Another new feature in Version 3.0 is automatic creation of document links
during recognition. You might, for example, set the program to create a
table of contents, bookmarks, indexes, e-mail addresses or URLs when the
program finds appropriately formatted text.
Capture's main display panel allows administrators to track the status
of documents, view the status of workflow steps on the local workstation
and view the status of all workflows on all stations in the workgroup. The
interface is concise and easy to grasp. The only thing we found ourselves
wishing for was more flexible alert tools. The program provides audio and
visual alerts if a warning is logged or a manual step is awaiting execution.
But these alerts require that Capture be running on your local workstation.
The end result of a Capture workflow is a cross-platform PDF document that
is eminently searchable on a LAN or via the Internet using Adobe Acrobat
4.0. By the way, you also can "push" a copy of the final document to specified
e-mail addresses as part of the workflow.
Acrobat Capture is a powerful, scalable and easy-to-use solution for
turning hard-copy documents into searchable, online documents that faithfully
reproduce the layout and formatting of the original. If you're only going
to be dealing with scanned documents, and if PDF files are suitable as output,
you won't find a more powerful, easier-to-use solution.