System automates archivists' decisions

Archivists and publicservice lawyers have spent years in federal courts trying to determine which documents are 'records' and how they should be preserved. Now there is a software system designed to decide that instantly.

Archivists and public-service lawyers have spent years in federal courts

trying to determine which documents are "records" and how they should be

preserved. Now there is a software system designed to decide that instantly.

The new system is able to analyze electronic documents as they are being

written and determine whether they are records. If so, the system decides

where the records should be routed for long-term storage. "You no longer

need to rely on your end users to file and properly classify valuable documents,"

said David Warner, a sales manager for Provenance Systems Inc., which makes

the records management system.

But government records management experts are skeptical. Generally it

still takes intervention by human beings to determine which documents are

worth saving as records, said Michael Miller, director of modern records

programs at the National Archives and Records Administration.

Increasingly, however, software developers are trying to create programs

that can make decisions that previously required human intelligence. Smart

search engines, for example, are being designed to use past searches to

help direct future searches. And office software soon is expected to be

able to compile profiles of its users' work habits, anticipating information

they might need for work.

The new recordkeeping system, called AutoRecords, is a step in that direction.

It works by analyzing the digital patterns in a document. Based on the patterns

it finds, it determines what category of record the document is and where

it should be filed, Warner said. Provenance partnered with Autonomy Inc.,

which developed the analyzing software called a "dynamic reasoning engine"

used in AutoRecords.

Initially, the reasoning engine is "trained" to sort records by analyzing

sample documents in known categories. Then it compares patterns in the unknown

documents with patterns in known documents, and the system is able to classify

the documents, Warner said. "The more it reads, the smarter it becomes,"

he said.

The reasoning engine was developed for use by intelligence agencies

that comb through massive amounts of digital data for useful information.

When AutoRecords makes mistakes, it will do so consistently, Warner

said. If files in a certain category are misfiled, they all will be misfiled

in the same place, making them easy to find. Humans are more likely to misfile

records "all over the place," he said.

The software is designed to help large organizations cope with the virtual

landslide of documents generated by e-mail, contracts, correspondence, invoices,

employee files and other records.

The ability to automatically classify and store records may respond

to a genuine need, said Steven Aftergood, director of the Federation of

American Scientists' Project on Government Secrecy. Because the number of

records is growing dramatically, "records management is becoming increasingly

sophisticated and burdensome, and as a result, it often simply gets deferred

or ignored," Aftergood said.

But systems like AutoRecords are unlikely to help solve some of the

knottier records problems with which government agencies are wrestling,

including whether original electronic records or paper copies must be saved,

and how electronic records can be stored with assurance that they can be

retrieved decades in the future.

Automatic document sorting systems may be unable to differentiate between

similar documents — drafts of a document, for example — that do not need

to be saved and final versions that do, Aftergood said. "You can't judge

everything about a record by the contents."

And while employees may like being free from the burden of classifying

and filing records, there is a potential downside.

There is an element of "Big Brother" to the system, Warner said. The

system is capable of analyzing and categorizing documents such as e-mail

messages and information downloaded from World Wide Web sites. Thus it could

be used to track how employees are using their computers.

NEXT STORY: DOD leans on leasing, outsourcing