EPA builds a better search

A keyword search in the Environmental Protection Agency's Web pages used to yield a mishmash of results. Typing, say, "water quality" in the search engine might have returned links to high-level overviews of water quality issues or to documents that merely mentioned water quality.

"The relevancy ranking of our search engine couldn't really say, 'Here's a general thing about water quality that could get you started,' " said Richard Huffine, program manager for the EPA's National Library Network. So EPA officials modified the search engine.

Now, the engine returns documents based on a ranking of data stored in metadata fields, giving priority — in descending order — to information that has the search query term embedded in a document's subject, title, description and text.

Draft recommendations, written in part by Huffine and issued by members of the Categorization of Government Information Working Group, call for adoption of similar metadata standards governmentwide. The working group is a subcommittee of the Interagency Committee on Government Information, a creation of the E-Government Act of 2002.

The metadata recommendations are part of group members' larger effort to preserve government information in digital formats and make it permanently available. The problem is that, although the federal government is permanent, individual agencies may not be. Documents stored digitally on one server can be moved to another. Such moves result in the all too common message "404 error — file not found."

Although it is technically possible to continually update databases to reflect changes as documents are moved, it is impractical, according to working group members. Instead of relying on URLs to locate digital information, members recommended that federal officials develop search schemes based on uniform resource names (URNs).

Federal officials would assign unique identifiers to each piece of government information — policy documents, Web sites, photos, maps and other digital materials. A searchable index would link users to a citation containing a minimum set of standardized metadata fields, such as subject, agency creator, title and publication date.

"If, for example, the identifier resolves to a book, then you get a citation for the book," said Eliot Christian, manager of data and information systems at the U.S. Geological Survey and chairman of the working group.

Combining URNs and a standardized metadata scheme would open the door to new possibilities for analysis, said James Erwin, primary author of the group's URN recommendations and director of information science and technology at the Defense Technical Information Center. "People can take that metadata and our identifier and put it into their database, their index, and they can use that for discovery," he said.

Information collected at one time by officials at one agency can be relevant in the future. Government surveys from the 1780s in the Northwest Territories, for example, are being used by Interior Department officials today to assess changes in vegetation patterns in Michigan and Ohio.

Deciding which types of information merit universal identifiers, however, is still a matter of debate. The group's members define government information as "any information product, regardless of form or format, that an agency discloses, publishes, disseminates or makes available to the public, as well as information produced for administrative or operational purposes, that is of public interest or public value."

All data in its place

This month, members of the Categorization of Government Information Working Group issued draft recommendations for defining, categorizing, indexing and searching government information on the Web.

After a period of public comment ending Dec. 5, the group's members will send final recommendations to Office of Management and Budget officials, who will have a year to fashion a policy for making government information more accessible.

The draft recommendations call for federal officials to assign unique identifiers to each piece of government information online so that users can find information independent of URLs.

The working group's members recommend that government officials adopt by the end of fiscal 2006 an interim identification scheme published by the Internet Engineering Task Force.

The members estimate that the management and operation of that scheme, called a Global Handle Registry, would cost between $300,000 and $1 million a year.

They recommend that Defense Information Systems Agency and General Services Administration officials assign and maintain unique identifiers for information online.

— David Perera

About the Author

David Perera is a special contributor to Defense Systems.

Featured

  • Defense
    Soldiers from the Old Guard test the second iteration of the Integrated Visual Augmentation System (IVAS) capability set during an exercise at Fort Belvoir, VA in Fall 2019. Photo by Courtney Bacon

    IVAS and the future of defense acquisition

    The Army’s Integrated Visual Augmentation System has been in the works for years, but the potentially multibillion deal could mark a paradigm shift in how the Defense Department buys and leverages technology.

  • Cybersecurity
    Deputy Secretary of Homeland Security Alejandro Mayorkas  (U.S. Coast Guard photo by Petty Officer 3rd Class Lora Ratliff)

    Mayorkas announces cyber 'sprints' on ransomware, ICS, workforce

    The Homeland Security secretary announced a series of focused efforts to address issues around ransomware, critical infrastructure and the agency's workforce that will all be launched in the coming weeks.

Stay Connected