Making Web data permanent
- By David Perera
- Nov 04, 2004
Interagency Committee on Government Information
U.S. government information is supposed to be permanent, but Web links can break. Now, a federal group says Uniform Resource Names (URNs) can solve the problem.
Officials at the Categorization of Government Information Working Group issued draft recommendations on interoperable standards for searchable identifiers and a proposed definition of government information. Draft recommendations on open and interoperable standards for the categorization of government information are set to begin a public comment period Nov. 9.
The working group is a subcommittee of the Interagency Committee on Government Information, which was created by the E-Government Act of 2002.
Web pages are identified by URLs. Although it's technically possible to rely on them to permanently preserve government information online, it's practically unfeasible, say members of the working group.
"If we're going to go to the effort of categorizing [government information], then we need to be able to persistently identify it," said working group chairman Eliot Christian. Poorly maintained sites too often have broken links that result in "Not Found" Web errors, according to the group's recommendations.
Proponents of URNs believe they should be adopted because they create unique identifiers that allow users to view documents no matter where they are stored on the Web. Today's Web browsers are not URN-compatible, but that could change if government officials send "a clear signal to industry that such development and integration is essential", according to the report.
As an intermediate step, the group recommends that government officials adopt Global Handle Registry standards by the end of fiscal 2006. Those standards are compatible with the URN framework but rely on the widely used HTTP to locate documents. Group members estimate that name space management and operation would likely cost between $300,000 and $1 million per year. They identify the Defense Information Systems Agency and the General Services Administration as logical choices to assign and maintain unique identifiers.
Attached to every unique identifier would be metadata allowing keyword searches of the name/space index, said James Erwin, primary author of the searchable identifier recommendations. As a result, standard government metadata definitions would be useful, he added. "Certainly it would work more effectively if you had a standards, at least a standard for a particular genre of document."
Officials continue to debate which government information merits universal identifiers. The group defines government information as "any information product, regardless of form or format, that a U.S. federal agency discloses, publishes, disseminates or makes available to the public, as well as information produced for administrative or operational purposes, that is of public interest or public value."
Asked if, for example, a unique identifier needed to be assigned to Firstgov.gov every time the site's Webmasters post an update, Erwin replied he had thought more about assigning identifiers to policy documents. "We're still working out how these various pieces fit together," he added.
David Perera is a special contributor to Defense Systems.