EPA builds a better search

A keyword search in the Environmental Protection Agency's Web pages used to yield a mishmash of results. Typing, say, "water quality" in the search engine might have returned links to high-level overviews of water quality issues or to documents that merely mentioned water quality.

"The relevancy ranking of our search engine couldn't really say, 'Here's a general thing about water quality that could get you started,' " said Richard Huffine, program manager for the EPA's National Library Network. So EPA officials modified the search engine.

Now, the engine returns documents based on a ranking of data stored in metadata fields, giving priority — in descending order — to information that has the search query term embedded in a document's subject, title, description and text.

Draft recommendations, written in part by Huffine and issued by members of the Categorization of Government Information Working Group, call for adoption of similar metadata standards governmentwide. The working group is a subcommittee of the Interagency Committee on Government Information, a creation of the E-Government Act of 2002.

The metadata recommendations are part of group members' larger effort to preserve government information in digital formats and make it permanently available. The problem is that, although the federal government is permanent, individual agencies may not be. Documents stored digitally on one server can be moved to another. Such moves result in the all too common message "404 error — file not found."

Although it is technically possible to continually update databases to reflect changes as documents are moved, it is impractical, according to working group members. Instead of relying on URLs to locate digital information, members recommended that federal officials develop search schemes based on uniform resource names (URNs).

Federal officials would assign unique identifiers to each piece of government information — policy documents, Web sites, photos, maps and other digital materials. A searchable index would link users to a citation containing a minimum set of standardized metadata fields, such as subject, agency creator, title and publication date.

"If, for example, the identifier resolves to a book, then you get a citation for the book," said Eliot Christian, manager of data and information systems at the U.S. Geological Survey and chairman of the working group.

Combining URNs and a standardized metadata scheme would open the door to new possibilities for analysis, said James Erwin, primary author of the group's URN recommendations and director of information science and technology at the Defense Technical Information Center. "People can take that metadata and our identifier and put it into their database, their index, and they can use that for discovery," he said.

Information collected at one time by officials at one agency can be relevant in the future. Government surveys from the 1780s in the Northwest Territories, for example, are being used by Interior Department officials today to assess changes in vegetation patterns in Michigan and Ohio.

Deciding which types of information merit universal identifiers, however, is still a matter of debate. The group's members define government information as "any information product, regardless of form or format, that an agency discloses, publishes, disseminates or makes available to the public, as well as information produced for administrative or operational purposes, that is of public interest or public value."

All data in its place

This month, members of the Categorization of Government Information Working Group issued draft recommendations for defining, categorizing, indexing and searching government information on the Web.

After a period of public comment ending Dec. 5, the group's members will send final recommendations to Office of Management and Budget officials, who will have a year to fashion a policy for making government information more accessible.

The draft recommendations call for federal officials to assign unique identifiers to each piece of government information online so that users can find information independent of URLs.

The working group's members recommend that government officials adopt by the end of fiscal 2006 an interim identification scheme published by the Internet Engineering Task Force.

The members estimate that the management and operation of that scheme, called a Global Handle Registry, would cost between $300,000 and $1 million a year.

They recommend that Defense Information Systems Agency and General Services Administration officials assign and maintain unique identifiers for information online.

— David Perera

About the Author

David Perera is a special contributor to Defense Systems.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.


  • Anne Rung -- Commerce Department Photo

    Exit interview with Anne Rung

    The government's departing top acquisition official said she leaves behind a solid foundation on which to build more effective and efficient federal IT.

  • Charles Phalen

    Administration appoints first head of NBIB

    The National Background Investigations Bureau announced the appointment of its first director as the agency prepares to take over processing government background checks.

  • Sen. James Lankford (R-Okla.)

    Senator: Rigid hiring process pushes millennials from federal work

    Sen. James Lankford (R-Okla.) said agencies are missing out on younger workers because of the government's rigidity, particularly its protracted hiring process.

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group