System automates archivists' decisions
- By William Matthews
- Mar 12, 2000
Archivists and public-service lawyers have spent years in federal courts
trying to determine which documents are "records" and how they should be
preserved. Now there is a software system designed to decide that instantly.
The new system is able to analyze electronic documents as they are being
written and determine whether they are records. If so, the system decides
where the records should be routed for long-term storage. "You no longer
need to rely on your end users to file and properly classify valuable documents,"
said David Warner, a sales manager for Provenance Systems Inc., which makes
the records management system.
But government records management experts are skeptical. Generally it
still takes intervention by human beings to determine which documents are
worth saving as records, said Michael Miller, director of modern records
programs at the National Archives and Records Administration.
Increasingly, however, software developers are trying to create programs
that can make decisions that previously required human intelligence. Smart
search engines, for example, are being designed to use past searches to
help direct future searches. And office software soon is expected to be
able to compile profiles of its users' work habits, anticipating information
they might need for work.
The new recordkeeping system, called AutoRecords, is a step in that direction.
It works by analyzing the digital patterns in a document. Based on the patterns
it finds, it determines what category of record the document is and where
it should be filed, Warner said. Provenance partnered with Autonomy Inc.,
which developed the analyzing software called a "dynamic reasoning engine"
used in AutoRecords.
Initially, the reasoning engine is "trained" to sort records by analyzing
sample documents in known categories. Then it compares patterns in the unknown
documents with patterns in known documents, and the system is able to classify
the documents, Warner said. "The more it reads, the smarter it becomes,"
The reasoning engine was developed for use by intelligence agencies
that comb through massive amounts of digital data for useful information.
When AutoRecords makes mistakes, it will do so consistently, Warner
said. If files in a certain category are misfiled, they all will be misfiled
in the same place, making them easy to find. Humans are more likely to misfile
records "all over the place," he said.
The software is designed to help large organizations cope with the virtual
landslide of documents generated by e-mail, contracts, correspondence, invoices,
employee files and other records.
The ability to automatically classify and store records may respond
to a genuine need, said Steven Aftergood, director of the Federation of
American Scientists' Project on Government Secrecy. Because the number of
records is growing dramatically, "records management is becoming increasingly
sophisticated and burdensome, and as a result, it often simply gets deferred
or ignored," Aftergood said.
But systems like AutoRecords are unlikely to help solve some of the
knottier records problems with which government agencies are wrestling,
including whether original electronic records or paper copies must be saved,
and how electronic records can be stored with assurance that they can be
retrieved decades in the future.
Automatic document sorting systems may be unable to differentiate between
similar documents — drafts of a document, for example — that do not need
to be saved and final versions that do, Aftergood said. "You can't judge
everything about a record by the contents."
And while employees may like being free from the burden of classifying
and filing records, there is a potential downside.
There is an element of "Big Brother" to the system, Warner said. The
system is capable of analyzing and categorizing documents such as e-mail
messages and information downloaded from World Wide Web sites. Thus it could
be used to track how employees are using their computers.