Archives to scale volumes of snapshots

When the National Archives completes the task of collecting "snapshots" of all federal Web sites, it will have to figure out how to store and search through 21 terabytes of digital information.

"We're not scaled to do that now. We will have to build up the capacity to handle it," said Mike Miller, director of the Archives' modern records program.

The snapshots were ordered by the outgoing Clinton administration to preserve archival copies of federal Web sites as they existed Jan. 20. Senior Clinton officials said they wanted a record of the electronic government developed during their watch.

To archivists, the snapshots have a less specific, but perhaps greater worth.

"We save these things for one reason and find that people find tons of ways to use them," Miller said. He said, for example, accounting records captured during the collapse of Nazi Germany sat largely unused for about a half century, but in recent years they have become valuable for tracing looted gold and treasure.

The snapshots of government Web sites are also certain to prove valuable, he said.

Some agencies may find them useful in settling legal disputes. Researchers will no doubt find them valuable for tracing the early development of electronic government.

"We felt we would be kicking ourselves if we did not" take the snapshots, Miller said. So far, 38 agencies, mainly small ones, have sent Web snapshots to the Archives, Miller said Feb. 16. There are at least three times that many federal agencies. The deadline is March 20.

Agencies must capture the Web site as it appeared Jan. 20, complete with working links between the site's pages and layers. Snapshots are being sent to the Archives on CD-ROMs or tape and eventually are to be transferred to digital linear tape for long-term storage.

If printed on paper, the 21 terabytes of Web data would be roughly double the amount of information contained in the Library of Congress' collection of 20 million volumes.

Because of the volume of data involved, the Archives does not want to make a practice of periodically collecting agency Web site snapshots. "We want to get this on a more regularized basis," Miller said. The record-keeping agency hopes to have new guidelines in place next month instructing agency Web managers on how to routinely preserve Web site records.

Featured

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

  • Comment
    Pilot Class. The author and Barbie Flowers are first row third and second from right, respectively.

    How VA is disrupting tech delivery

    A former Digital Service specialist at the Department of Veterans Affairs explains efforts to transition government from a legacy "project" approach to a more user-centered "product" method.

  • Cloud
    cloud migration

    DHS cloud push comes with complications

    A pressing data center closure schedule and an ensuing scramble to move applications means that some Homeland Security components might need more than one hop to get to the cloud.

Stay Connected

FCW INSIDER

Sign up for our newsletter.

I agree to this site's Privacy Policy.