Future electronic records archives bet on XML

The survivability of digital information over time, particularly as agencies rely less on paper and more on electronic formats, is the biggest challenge the National Archives and Records Administration faces. How can it ensure that documents will be readable by future generations when the software used to store those files will likely no longer be used?

NARA appears to have found an answer in the electronic archives program — an estimated $130 million project that will use XML technology to help ensure that all documents can be read without needing the software program that produced them. "With XML, it doesn't matter where the technology goes," said Kenneth Thibodeau, director of NARA's Electronic Records Archives Program. "The XML tools are simple enough that future computers should be able to deal with [the data]."

As part of the electronic archives program, electronic documents will be converted into XML and given XML tags that describe elements of the document such as a name or Social Security number. Document type definitions will describe the content and structure of a document, and style sheets will describe how a particular document is to be formatted. The records will then be stored on a tape cartridge, which in turn will be stored in a type of data warehouse.

NARA plans to take XML even a step further. "As it turns out, to do what the Archives needs to do to deliver authentic records over time, researchers determined that document type definitions and style sheets are not sufficient," Thibodeau said. "What they're exploring is XML topic maps as a way to represent the knowledge we have and need to communicate."

XML topic maps will help the Archives connect records with agencies' business processes and to search and mine the data later. "To the extent you're keeping government records, you need to be able to link the records with the original activity," Thibodeau said. "You can impose any number of topic maps on the same body of information. We know that the ability to mine the records using this technology will be very helpful for us in producing the descriptions the citizens use to find out [which] government records might have government information in them."

Thibodeau anticipates that agencies will already be using XML for business when the electronic archives project is operating, about four years from now.

Featured

  • Telecommunications
    Stock photo ID: 658810513 By asharkyu

    GSA extends EIS deadline to 2023

    Agencies are getting up to three more years on existing telecom contracts before having to shift to the $50 billion Enterprise Infrastructure Solutions vehicle.

  • Workforce
    Shutterstock image ID: 569172169 By Zenzen

    OMB looks to retrain feds to fill cyber needs

    The federal government is taking steps to fill high-demand, skills-gap positions in tech by retraining employees already working within agencies without a cyber or IT background.

  • Acquisition
    GSA Headquarters (Photo by Rena Schild/Shutterstock)

    GSA to consolidate multiple award schedules

    The General Services Administration plans to consolidate dozens of its buying schedules across product areas including IT and services to reduce duplication.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.