NARA electronic archive has fundamental flaw in search, IG says

Editor's note: This story was modified after its original publication to clarify information.

People trying to search the text of documents through the National Archives and Records Administration’s $430 million Electronic Records Archive are going to be disappointed, according to the agency’s inspector general.

Under the currently deployed system, users can search only by metadata. That typically includes tags for information such as name of the original publication, date of publication, agency that originated the document, and a small number of keywords. Users who hope to locate a document by a word or phrase that isn't part of the metadata will be unable to.


Related story:

NARA finishes e-archive project, chooses IBM to maintain system


The public’s ability to use the ERA is likely to be hampered because of the lack of a full text-based search capability, which would be similar to what is available on Google.com or other commercial search engines, NARA Inspector General Paul Brachfeld said in an interview on Oct. 26.

Lack of full text search “is one of the profound problems with the ERA at this point,” Brachfeld said. “Metadata alone does not tell the story of what is in the documents.”

Brachfeld recently released online copies of two management letters he previously wrote in January and May of 2011 to Archivist of the United States David Ferriero on the inadequacy of search tools in the ERA.

The agency has acknowledged some of the limitations with the ERA. It completed its $430 million system development contract with Lockheed Martin in September and did not extend it for an optional year. It has hired IBM to maintain and operate the ERA on an annual contract valued at $243 million over 10 years if all options are exercised.

Under the new contract, the agency will encourage IBM to try to enhance the system to add text search capability, but it is not clear whether that capability would be permitted under the current system architecture, or if the costs of the additional capability would be prohibitive, Brachfeld said. Furthermore, he added, adding the full text search capability at this time may interfere with protections for personally-identifiable data.

“It is built into the contract to try to address the full text search capability,” Brachfeld said. “I am not sure what they can do.”

Lack of text searchability “ is an important weakness, and I am not sure it can be corrected,” he said.

Brachfeld said the flawed system was poorly designed by a succession of managers, many of whom have left government. "The program has had problems since its inception under then Archivist of the United States John Carlin," he said. There have been three successive US archivists in charge since Carlin's tenure, he added.

Throughout the multi-year program, the inspector general’s office continued to ask about search capability, he said.

The office asked “fundamental questions of ERA program managers, employees, contractors and senior NARA officials. The most basic being, ‘At full operational capability, will the common citizen be able to effectively access and research the electronic records they are entitled access to over the Internet?’" Brachfeld wrote in the May 4 management letter. “We believe the answer, with limited caveats, is no.”

Brachfeld also warns that because of limited search capabilities, severe bottlenecks are likely to develop in screening documents for entry into the ERA because of the need to identify and remove classified information and personally-identifiable information.

While agencies are not supposed to send classified information to the ERA, it is likely that screening will be needed to ensure that classified information does not appear, and that may cause slowdowns of the system, he suggested.

“If one imagines ERA as a busy six-lane highway moving an immense amount of traffic, this part of the ingest procedure is akin to closing five lanes for a stretch. While the rest of the highway remains capable of transporting all the traffic, the back-up or bottleneck caused by that one stretch makes it impractical to use the road,” Brachfeld wrote.

About the Author

Alice Lipowicz is a staff writer covering government 2.0, homeland security and other IT policies for Federal Computer Week.

The Fed 100

Save the date for 28th annual Federal 100 Awards Gala.

Featured

  • Social network, census

    5 predictions for federal IT in 2017

    As the Trump team takes control, here's what the tech community can expect.

  • Rep. Gerald Connolly

    Connolly warns on workforce changes

    The ranking member of the House Oversight Committee's Government Operations panel warns that Congress will look to legislate changes to the federal workforce.

  • President Donald J. Trump delivers his inaugural address

    How will Trump lead on tech?

    The businessman turned reality star turned U.S. president clearly has mastered Twitter, but what will his administration mean for broader technology issues?

  • Login.gov moving ahead

    The bid to establish a single login for accessing government services is moving again on the last full day of the Obama presidency.

  • Shutterstock image (by Jirsak): customer care, relationship management, and leadership concept.

    Obama wraps up security clearance reforms

    In a last-minute executive order, President Obama institutes structural reforms to the security clearance process designed to create a more unified system across government agencies.

  • Shutterstock image: breached lock.

    What cyber can learn from counterterrorism

    The U.S. has to look at its experience in developing post-9/11 counterterrorism policies to inform efforts to formalize cybersecurity policies, says a senior official.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group