Army to scan 17 million Gulf War records
- By Elana Varon
- Jan 21, 1996
When President Clinton formed an advisory group last year to examine evidence of illness among Persian Gulf War veterans, it meant the Defense Department would have to review and declassify millions of records from the 1991 conflict. The Army responded to this mandate by fielding a new imaging and text-retrieval system that is expected to become fully operational this month.
The Army is among several military agencies that are inspecting Gulf War documents, only some of which hold medical information. Because Army officials did not know which of more than 17 million pages of operations records might be needed by the task force, it had to identify the right ones quickly.
So the Alexandria, Va.-based Army Center for Military History spent $2.5 million on a pair of scanners, work-flow and text-retrieval software, workstations and support services to automate the review process. For at least the next two years, and probably longer, the agency will use the new system to search scanned documents for key words and phrases and then route the relevant documents to reviewers for declassification and to investigators, said Sam Budak, project manager with the Army Information Management Support Center.
Records that are declassified will be posted on the World Wide Web by the Defense Technical Information Center. The Navy and Marine Corps will use the Army's imaging system to process their own records for the Gulf War declassification project.
The system uses Eastman Kodak Co. scanners and Intrafed Inc.'s PowerScan and StageWorks optical character recognition (OCR) and image-capture software to convert the documents to digital form. Reviewers will then use Excalibur EFS, a text search and retrieval package by Excalibur Technologies Corp., to retrieve the required documents from the database.
Meanwhile, Wang Federal Systems is supplying work-flow software, an image viewer and automated redaction tools that will help reviewers manage the declassification process. John Flynn, vice president for business development with Wang, said one of the unique aspects of the project is that images "are an input and an output product."
The main purpose of many imaging systems, Flynn said, is to enable users to capture data that will be used independently. In this case, however, users will search the data captured from the scanned documents to find the records they need and then retrieve complete images.
High Throughput Rate a Concern
One major technical challenge facing system developers is maintaining a high throughput rate. President Clinton's original executive order required the task force to make its report by the end of this year, which means the Center for Military History has to scan and index 20,000 pages a day, Budak said—a speed that would tax the OCR capabilities of most imaging systems.
Full-text indexing can slow an imaging system "down to a crawl," said Jeff Meshinsky, vice president of sales and marketing with Intrafed. "We're daisy chaining a bunch of OCR systems together to get designated throughput." Because most documents will not be captured with complete accuracy, and much of the data will not be "cleaned" before it is made available to reviewers, the Army is employing the "fuzzy logic" search capabilities of Excalibur's text-retrieval software to compensate.
"It costs more to clean up information than all the hardware and software for the entire imaging system," said Tom Polivka, Excalibur's director of federal operations. Software that takes into account misspellings or other errors lets users "start searching very quickly without organizing and cleaning up the data."
Similar systems are being developed throughout the government to help agencies respond to Freedom of Information Act requests and to comply with a Clinton order to declassify millions of historical documents. The executive order on declassification, separate from the Gulf War order, would require by 2000 the release of all documents that are more than 25 years old unless an agency determines they should still be kept secret.
Like the Gulf War documents, all declassified information must be placed in a publicly accessible database. Jeanne Schauble, director of records declassification with the National Archives and Records Administration, said NARA plans to develop policies for agencies to follow as they put these databases together. In addition, a study conducted by Mitre Corp. for NARA just more than a year ago recommended a common architecture for agencies to use to conduct declassification reviews.