Rescuing missed information

Cutting-edge commercial wares give agencies a whole new outlook on searching for information

About a decade ago, federal agencies started to grapple with the specter of storing and sharing mass quantities of information. A governmentwide search standard seemed like the right solution at the time. It would ensure that important records would remain accessible regardless of which agency or system housed them.

But since then, a steady stream of new information retrieval technology, popularized by user-friendly Web sites such as Google and Yahoo, have hooked legions of users and attracted the attention of agency executives. Many of them resist the government search standard's burdensome requirements.

The overhaul of the FirstGov Web portal is providing a high-profile example of the potential of new search technologies for government. Therefore, experts believe agencies will follow industry and adopt cutting-edge search technologies such as metasearch, clustering and topic maps. Those techniques promise to dig deeper into the government's online knowledge base, in addition to making search results much easier to use.

Several federal Web sites already use metasearch and clustering features, which allow searches to span disparate systems, data types and agencies, a challenge that the governmentwide search standard was supposed to address. Others are experimenting with topic maps to help employees find relevant information that traditional search tools might overlook.

Dave Goebel, president of the Goebel Group, a consulting firm, said federal users have become so comfortable with commercial search engines that they now expect similar performance from government search tools.

"That's forcing the agencies to proactively seek out commercial solutions," he said.

Meanwhile, search vendors continue to expand their products' capabilities in response to demands for better, faster retrieval tools. For example, Google sells enterprise search products, apart from its Web search site, that agencies can use with topic maps, metasearch and clustering tools.

"We are making our search appliances more and more open," said Rajen Sheth, product manager in Google's enterprise group. "We've built [application programming interfaces] to integrate search with a variety of different types of data and applications."

Another priority for vendors is helping users make more sense of search results that can list hundreds and even thousands of hits.

"The ongoing problem is that just about anything you type in [a search form] will lead to an overabundance of information," said Raul Valdes-Perez, co-founder of Vivisimo, which runs the clustering search site Clusty.com, and an adjunct associate professor of computer science at Carnegie Mellon University.

GILS: Interoperable search

The government first attempted to tackle search issues 10 years ago when it created the Global Information Locator Service standard. GILS responds to searches that reference information by title, subject, author, date and location. For GILS to work effectively, federal employees need to index all public government information by assigning those five labels to electronic records.

Based on the International Organization for Standardization's (ISO) 23950 specification for information search and retrieval, GILS would allow a user on one system to search and retrieve information from other GILS-compliant systems. The library community, smitten with the notion of an electronic card catalog of human knowledge, embraced the GILS standard.

Eliot Christian, who created GILS and manages data and information systems at the U.S. Geological Survey, said GILS would eventually allow users to search for government information with the search engine of their choice -- Google, Microsoft's MSN Search or FirstGov, for example. The standard would then direct users to the proper information on their first search attempt. In addition to convenience for current users, GILS would guide researchers to the proper information sources in decades to come.

At an April meeting of the Industry Advisory Council's eGovernment Shared Interest Group, Christian urged the government to be tougher about requiring agencies to specify GILS compatibility for new acquisitions.

The Office of Management and Budget's Circular A-130 and the Paperwork Reduction Act of 1995 mandate GILS-compliant software, but most departments have ignored them because they say manually coding records is too complicated and time consuming.

Christian said that by adopting GILS, agencies could reduce the costs of managing older systems and implementing new search technologies, which frequently change.

However, government officials find the evolution of commercial products more appealing.

The National Institute of Standards and Technology officials recently proposed withdrawing GILS as a mandatory federal standard because modern search technology has eclipsed it, they say. In another blow to GILS, General Services Administration officials decided not to require GILS compliance when they awarded a contract to revamp FirstGov's search engine.

Former federal officials say that if they had known robust commercial search tools were on the horizon, they would not have pushed GILS as a governmentwide standard. Dan Chenok, former branch chief of information policy and technology at OMB, said officials wrote the policies when search engines were in their infancy.

"Most agencies have been implementing the goals that underlie GILS and the requirements of the Paperwork Reduction Act with search engine technology that did not exist 10 years ago," said Chenok, now a vice president at SRA International.

Some experts say GILS and the new crop of search technologies can coexist.

Kevin McCook, federal sales director at search tools vendor Verity, said GILS helped streamline federal records management.

"Not everyone has complied, but there has been effective guidance available that has, at least, limited the potential chaos," he said.

McCook said Verity supports GILS as a stabilizing force across government, but he endorses other technologies for deep and complex probes. He said the intelligence world and some scientific organizations need specialized, more advanced techniques that surpass GILS' simplistic tagging conventions.

Metasearch: Expanding frontiers

Most government Web sites still rely on traditional search technology, which suffers from a serious shortcoming when working with different information types.

Traditional search engines use automated software called a crawler, which reads information on static Web pages and builds a central index with links to the original sources. The engines compare search queries against this index and quickly generate lists of links.

But online information storage increasingly favors structured database systems instead of static Web pages. Those database systems present their information on Web pages only when users request it. Search engines that use traditional crawlers never see all the information stored in databases, which include valuable government resources such as PubMed and USAJobs.

Metasearch, also known as federated search, can eliminate this blind spot. A single search triggers multiple simultaneous queries of selected databases, the Web and site-specific search engines, such as NASA.gov. The metasearch tool then collects and combines the search results, eliminates redundancies and presents the finished product as one list.

According to a recent study by the free Web metasearch service Dogpile.com, only 1.1 percent of the first page results from the four leading commercial search sites match. Dogpile uses the federated search technique to launch simultaneous queries of those four sites -- Ask Jeeves, Google, MSN Search and Yahoo -- and report results in a consolidated hit list

Programmers who work with metasearch tools say the federated search technique produces more reliable results than traditional search engines.

"People who assume that Google has everything…really miss relevant items," said Tamas Doszkocs, a computer scientist at the National Library of Medicine (NLM). He has been working for almost a decade on a metasearch engine called ToxSeek, which scours toxicology and environmental health databases at government agencies. The site, accessible during its beta-testing phase, is scheduled to launch later this year.

In addition to metasearch capabilities, ToxSeek also uses clustering, another new search technique. With clustering, algorithms sort search results into groups based on textual and linguistic similarities.

For example, a ToxSeek user could search for "cancer" and "smoking," and the system would return results categorized by a variety of subheads, including the information's source, topic and type.

Clustering lets users see results that would otherwise appear near the end of ranked lists, and they can survey the information landscape before digging in.

One of the earliest adopters of clustering in the government is the Homeland Security Digital Library. The library, maintained by the Homeland Security Department and Naval Postgraduate School, deployed a version of ToxSeek more than six months ago.

The search tool, named SeekOnce, short for "Seek Once, Retrieve Many," spans a variety of resources, such as research studies, theses, white papers, legislation, journal articles and commercial databases. It can read plain text documents, PowerPoint presentations, multimedia files, images and spreadsheets. SeekOnce accesses about 50 databases and may eventually extend to as many as 250.

GSA officials gave metasearch and clustering tools a public vote of confidence last month, when they selected a new search engine for the FirstGov Web portal. Contract winner Vivisimo will work with Microsoft to provide metasearch and clustering capabilities to FirstGov users.

The move will expand the reach of the portal's search engine into a greater variety of government-related content and make it easier for users to navigate search results by clustering hits according to subject matter.

Meanwhile, the most established government metasearch tool is Science.gov, an interagency product hosted by the Energy Department's Office of Scientific and Technical Information (OSTI).

Since 2002, a dozen agencies, including the Defense and Agriculture departments and NLM, have contributed to the portal. Science.gov, like ToxSeek, can query selected databases, such as PubMed, MedlinePlus and DefenseLINK, but it does not cluster results.

OSTI Director Walter Warnick said science teachers are using the portal often and giving it positive reviews. The mayor of Oak Ridge, Tenn., home of the Science.gov portal, used it to help his child with homework.

When a big science story breaks, such as a tsunami or hurricane, people can look for context at Science.gov. Warnick added that it is also popular with college students looking for science internships and fellowships.

"We think we have 98 percent of all the federal research and development budget represented in Science.gov," Warnick said.

That desire for comprehensiveness illuminates one of the primary shortcomings of metasearch tools, however. "There are limits on how many subordinate databases you can do at one time," he said. "The more databases you have, the slower the response."

Search tools that use crawlers don't have that problem because their search queries are run only against the central index that the crawler created.

For this reason, OSTI posted a lesson plan on its education site, ScienceLab, which advises students to use commercial engines in combination with Science.gov for a more productive search experience.

"We see metasearch as not a competition with Google but as complementary," Warnick said.

Topic maps: Making connections

Traditional search engines, while increasingly more precise and expansive, cannot think like a human being. For example, what happens when two sets of electronic documents use different words and vocabularies to discuss a related topic? A traditional search engine might miss the link between the two sets, because it can only match words, not the meaning of the ideas discussed within them.

The still-emerging area of topic maps can help educate search engines.

Like metasearch, topic map techniques do not replace traditional search tools. They can work in conjunction with them, however, to provide more powerful search navigation. For example, a NASA topic map could be set up so that when a person enters "Pathfinder" into a search form, the topic map guides the user to related items, such as "Mars lander" and "evidence suggesting liquid water was once a stable presence on Mars."

Several federal agencies, including Energy, the Defense Intelligence Agency and the Internal Revenue Service, have started to add topic maps to traditional search technology.

In 2001, when IRS officials wanted to improve customer service on their tax assistance hot line, they developed an internal topic map that would help call center operators find relevant information more easily.

The IRS topic map identified many of the descriptors that callers might use and then programmed links between related terms. For example, the terms "abandonment" and "disposition of property" are different ways of referring to something that has similar tax implications. With topic maps working alongside the search engine, a query for either term would direct a call center operator to the relevant online information.

Michel Biezunski, a consultant at Coolheads Consulting who worked on the IRS project and co-editor of the ISO standard for topic maps, said the old call center setup forced operators to flip through numerous resources, including manuals and multiple technical Web sites, to answer caller inquiries.

"There was too much information and not enough time because they were on the phone," Biezunski said. Now the topic map guides hot line operators to the most helpful information faster.

Topic map implementation requires more elbow grease than search appliance installation. Unlike traditional search engines, most topic maps require human and artificial intelligence. A computer does not know that "abandonment" and "disposition of property" are related in a tax scenario. A person, typically a government subject-matter expert, has to teach the computer to recognize the relationship.

Many topic map applications include a search engine to help users find a starting place in the knowledge network. In some cases, topic maps are not necessary for simple searches.

"If you can use the usual stuff, then do it," Biezunski said. "Google is fine if what you have to do is relatively shallow. If you are really trying to explore a domain, then Google must be frustrating."

Intelligence agencies, which want to share information but must first translate one another's jargon, are candidates for topic map experimentation, experts say.

George Kondrach, executive vice president at Innodata Isogen, an information management consulting firm, has been helping the Office of Naval Intelligence and DIA with their topic map projects for about a year.

"People like the CIA and DIA and the ONI, they don't even speak different dialects of the same language, they speak different languages," he said. Secrecy is one reason for the differences, but the agencies need to share some information, he added.

"This [topic map] overlay transcends the semantics of each agency," he said.

Some topic map consultants would like the Government Accountability Office to encourage all government agencies to convert their vocabularies into topic maps.

"I'd be real interested in seeing the GAO get excited about something like this. It would make their job so much easier," said Patrick Durusau, a private consultant who is co-editor of the Topic Maps Reference Model and chairman of the U.S. National Technical Advisory Group to the ISO committee that developed the topic map standard.

Integrating departments' topic maps would create greater transparency and accountability in government, he said.

"Topic maps enable individual agencies to retain not only their traditional nomenclature but also their information systems," Durusau said. "A topic map sits as a wrapper around such resources and provides the means to reliably merge data from different agencies into a single coherent view."

It might give the expression "cutting through government red tape" a whole new meaning.


**********

NEXT STORY: IRS preps seat management study

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.