NARA electronic archive has fundamental flaw in search, IG says

Editor's note: This story was modified after its original publication to clarify information.

People trying to search the text of documents through the National Archives and Records Administration’s $430 million Electronic Records Archive are going to be disappointed, according to the agency’s inspector general.

Under the currently deployed system, users can search only by metadata. That typically includes tags for information such as name of the original publication, date of publication, agency that originated the document, and a small number of keywords. Users who hope to locate a document by a word or phrase that isn't part of the metadata will be unable to.


Related story:

NARA finishes e-archive project, chooses IBM to maintain system


The public’s ability to use the ERA is likely to be hampered because of the lack of a full text-based search capability, which would be similar to what is available on Google.com or other commercial search engines, NARA Inspector General Paul Brachfeld said in an interview on Oct. 26.

Lack of full text search “is one of the profound problems with the ERA at this point,” Brachfeld said. “Metadata alone does not tell the story of what is in the documents.”

Brachfeld recently released online copies of two management letters he previously wrote in January and May of 2011 to Archivist of the United States David Ferriero on the inadequacy of search tools in the ERA.

The agency has acknowledged some of the limitations with the ERA. It completed its $430 million system development contract with Lockheed Martin in September and did not extend it for an optional year. It has hired IBM to maintain and operate the ERA on an annual contract valued at $243 million over 10 years if all options are exercised.

Under the new contract, the agency will encourage IBM to try to enhance the system to add text search capability, but it is not clear whether that capability would be permitted under the current system architecture, or if the costs of the additional capability would be prohibitive, Brachfeld said. Furthermore, he added, adding the full text search capability at this time may interfere with protections for personally-identifiable data.

“It is built into the contract to try to address the full text search capability,” Brachfeld said. “I am not sure what they can do.”

Lack of text searchability “ is an important weakness, and I am not sure it can be corrected,” he said.

Brachfeld said the flawed system was poorly designed by a succession of managers, many of whom have left government. "The program has had problems since its inception under then Archivist of the United States John Carlin," he said. There have been three successive US archivists in charge since Carlin's tenure, he added.

Throughout the multi-year program, the inspector general’s office continued to ask about search capability, he said.

The office asked “fundamental questions of ERA program managers, employees, contractors and senior NARA officials. The most basic being, ‘At full operational capability, will the common citizen be able to effectively access and research the electronic records they are entitled access to over the Internet?’" Brachfeld wrote in the May 4 management letter. “We believe the answer, with limited caveats, is no.”

Brachfeld also warns that because of limited search capabilities, severe bottlenecks are likely to develop in screening documents for entry into the ERA because of the need to identify and remove classified information and personally-identifiable information.

While agencies are not supposed to send classified information to the ERA, it is likely that screening will be needed to ensure that classified information does not appear, and that may cause slowdowns of the system, he suggested.

“If one imagines ERA as a busy six-lane highway moving an immense amount of traffic, this part of the ingest procedure is akin to closing five lanes for a stretch. While the rest of the highway remains capable of transporting all the traffic, the back-up or bottleneck caused by that one stretch makes it impractical to use the road,” Brachfeld wrote.

About the Author

Alice Lipowicz is a staff writer covering government 2.0, homeland security and other IT policies for Federal Computer Week.

The Fed 100

Save the date for 28th annual Federal 100 Awards Gala.

Featured

  • computer network

    How Einstein changes the way government does business

    The Department of Commerce is revising its confidentiality agreement for statistical data survey respondents to reflect the fact that the Department of Homeland Security could see some of that data if it is captured by the Einstein system.

  • Defense Secretary Jim Mattis. Army photo by Monica King. Jan. 26, 2017.

    Mattis mulls consolidation in IT, cyber

    In a Feb. 17 memo, Defense Secretary Jim Mattis told senior leadership to establish teams to look for duplication across the armed services in business operations, including in IT and cybersecurity.

  • Image from Shutterstock.com

    DHS vague on rules for election aid, say states

    State election officials had more questions than answers after a Department of Homeland Security presentation on the designation of election systems as critical U.S. infrastructure.

  • Org Chart Stock Art - Shutterstock

    How the hiring freeze targets millennials

    The government desperately needs younger talent to replace an aging workforce, and experts say that a freeze on hiring doesn't help.

  • Shutterstock image: healthcare digital interface.

    VA moves ahead with homegrown scheduling IT

    The Department of Veterans Affairs will test an internally developed scheduling module at primary care sites nationwide to see if it's ready to service the entire agency.

  • Shutterstock images (honglouwawa & 0beron): Bitcoin image overlay replaced with a dollar sign on a hardware circuit.

    MGT Act poised for a comeback

    After missing in the last Congress, drafters of a bill to encourage cloud adoption are looking for a new plan.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group