Power search

The federal government is beginning to see what the private sector has already discovered: Search technology could be the answer to all its information management problems.

A recent request for information, issued jointly by the General Services Administration and the Office of Management and Budget, asks whether search technology is powerful enough to replace some government standards for information management.

"Does current search technology perform to a sufficiently high level to make an added investment in metadata tagging unnecessary in terms of cost and benefit?" the Sept. 15 RFI asks. Responses are due by Oct. 21.

The notice will likely lead to and shape procurements in the next decade, according to supplementary information on the Federal Business Opportunities Web site. Some people say existing technologies that can fulfill the request are ready and waiting for the government to notice them.

Suggested approaches must meet the wide-reaching aim of identifying the most cost-effective means to search for, locate, retrieve and share information. The notice lists seven scenarios to provide context.

For example, the government is looking for information on how to help a physician search multiple databases and Web sites for treatments for a defense contractor's unexplained illness. The doctor might not know which agencies provide information on unexplained or service-related illnesses. He or she would also need a way to search nongovernmental sources, and some of the information might not be easily accessible through traditional Internet search engines.

In addition to tackling information sharing, vendors' suggested approaches must address the problem of access.

The RFI appears at a time when popular commercial search engines -- such as Google, Yahoo and Microsoft's MSN Search -- are about to retire a 10-year-old government search standard intended as an electronic card catalog of public government information.

The National Institute of Standards and Technology wants to withdraw the Government Information Locator Service because the agency considers the search standard obsolete. A July 15 Federal Register notice states that recalling the standard, also known as International Organization for Standardization's ISO 23950, seems justified because most agencies now use commercial search tools to help people locate government information.

Accordingly, the RFI seeks approaches that could avoid the use of government-mandated standards.

Alternatively, the notice asks vendors to explain why they believe government standards are not necessary or cost-effective.

Some government computer programmers say they are impressed by GSA's and OMB's foresight in issuing the notice.

Tamas Doszkocs, a computer scientist at the National Library of Medicine, has been working on the metasearch and clustering engine ToxSeek for almost a decade.

The RFI "is a very good way of taking a look at an extremely complex array of problems and solutions and trying to elicit feedback from major contractors who would be able to address this whole complex issue," he said. "It indicates a keen awareness of the complexity of the problems."

Doszkocs said only piecemeal solutions now exist in industry and government.

"There is nobody that could address and provide solutions to all of the concerns and problem sets," he said. "But there are certainly companies that have formidable technologies that could team up."

As OMB moves forward in soliciting help from industry, the agency is also seeking guidance from federal stakeholders, government officials say.

Last year, for instance, the Interagency Committee on Government Information submitted draft recommendations to OMB on adopting open, interoperable standards. Those standards would help agencies catalog information so that people can search any government system using terms that allow information to be identified electronically. Section 207 of the E-Government Act of 2002 required agencies to develop those recommendations.

The committee's report calls for the federal government to implement a searchable identifier standard that would provide long-term access to digital information. The paper states that the standard should be flexible enough to remain viable as technology changes and specific enough to provide authoritative access to government information.

OMB officials say they are considering the committee's ideas as they develop policies to foster better public access to government information. They will issue the policies to agencies by Dec. 17.

Karen Evans, OMB's administrator of e-government and information technology, said the RFI language asking whether search technology should replace government standards does not conflict with the E-Government Act.

"The question the RFI asks in no way suggests avoiding the use of standards when such are necessary," she said. "Moreover, it most certainly does not suggest noninteroperable searching. Rather, it seeks to identify where metadata tagging or other formal -- and costly -- advanced information preparation mechanisms achieve the goal of making information more easily accessible to interested parties."

In the three years since the E-Government Act was enacted, she added, improvements in commercial search technologies have altered the Bush administration's attitude toward business information retrieval solutions. The RFI seeks to ensure that the public benefits from commercial advances when it seeks government information, Evans said.

The Government Printing Office is also involved in the information retrieval and sharing initiative.

GPO, the agency responsible for distributing government publications, has assigned several employees to OMB during the past year. One of those employees will soon return to GPO for work on a new digital distribution system capable of verifying and tracking all versions of official government documents.

GPO officials say the system's design will ensure authenticity of government information and permanent public access to that information.

"The RFI will help our efforts since we are working closely with the community that generated the RFI and [that] is developing enhanced search tools," GPO spokeswoman Veronica Meter said.

By July 2007, GPO officials expect to have an operational system that will support Web browsing, downloading and printing. It will also have search tools and redundant data warehouses.

Vendors say intelligence agencies have already succeeded with endeavors similar to what OMB and GSA are looking for.

"The tools and products are already available to support this initiative," said Paul Norcini, federal channels manager at search tools supplier Verity.

Verity's solutions can index data formats from disparate repositories into searchable collections. Other tools then categorize the data based on concepts, metadata and highlighted information.

Indexing facilitates information sharing, while highlighting helps with retrieval, Norcini said.

Agency workers can also simultaneously search government and nongovernment systems with existing technology.

Norcini said OMB and GSA need to consider how their programs will detect patterns and connections among pieces of information.

"There is more to information sharing than just search," he said.

One global consortium is working with foreign governments on a massive information retrieval and sharing project that could influence the U.S. government's path.

Earlier this month, groups from industry, government, academia and nonprofit organizations announced plans to provide online versions of books, academic papers, video and audio to the world. The Internet Archive, a nonprofit entity that offers access to historical collections in digital format, will host the Open Content Alliance (OCA). The National Archives of the United Kingdom has already contributed to the effort.

The OCA "may significantly help the [U.S.] government in doing their public access mission," Internet Archive co-founder Brewster Kahle said. "The OCA is an almost unprecedented collaboration between nonprofits, libraries, government institutions and commercial search engines to bring to life the treasures that are currently locked up in independent collections."

Kahle said he has been talking to GPO officials for the past year about joining the alliance. The alliance will unveil a technology Oct. 25 that performs nondestructive scans of book pages at high resolutions for 10 cents a page. That cost savings could appeal to GPO and its Federal Depository Library Program, he said.

Anyone will be able to search and download works from the alliance's repository for free. Yahoo will provide the search engine, but all content will be available for other major search engines to index.

"The combination of large digital archives and the Internet could allow us to take all the U.S. government information and make it available through technologies such as commercial search engines," Kahle said. "We hope that the government considers the OCA as a way of achieving its aims."

Setting out the scenario

Can search technology replace government information standards? A request for information, issued by the General Services Administration and the Office of Management and Budget, seeks to address that question. Any approach that the government takes will need to identify the most cost-effective means for locating, retrieving and sharing information.

The RFI lays out a number of scenarios to provide context for responses.

Scenario 1: Researching unexplained illnesses among defense contractors.

A physician needs to perform a fairly exhaustive search for government information across the range of federal agencies, some state and local governments, and various commercial and academic resources. The information exists in a wide variety of formats, including handwritten forms that have been digitized. Some of those information resources are not easily accessible from typical Internet search engines -- sometimes called Deep Web resources. The physician needs to aggregate, analyze and manipulate the information relevant to the topic and also correlate data geospatially. The physician will publish a scholarly paper on the completed findings, including citations to e-government records. Those cited resources are expected to be obtainable in the future. The physician also wants to receive automatic notification whenever new information concerning unexplained military service-related illnesses is published.

Scenario 2: Searching for experts.

The government wants to identify experts to study an urgent, complex and relatively obscure technical issue. The experts could come from the federal, state, local or tribal governments or the private sector, especially academic and nonprofit organizations. Because the technical issue is relatively obscure, human resources and personnel management systems have not likely captured the related skills. The best way to identify experts is likely through an analysis of subject-matter work products and agencies' Web sites. But some of the relevant works may be within federal government information systems, outside the government or otherwise not readily accessible through Internet search engines.

Scenario 3: Performing academic research.

For a report on Poland's involvement in the Cold War, a student needs to locate and analyze information resources. This requires an ability to identify all relevant government and other resources, focusing more on primary sources -- such as reports, photos, maps and military unit histories -- than secondary sources -- such as textbooks and encyclopedias. The student must also translate resources into English as necessary, rank information resources by relevance and extract relevant facts, summaries and text passages from some of those resources. The assignment also requires the student to find maps of various Cold War hot spots and add information from other sources to those maps. Finally, the student must organize the information through an analysis of the relevant resources and publish the work as a paper and Web site.

Scenario 4: Tracing information audit trails.

An organization must track the flow of electronic information on a specific topic among government agencies to understand how and where the information was processed. It also must identify the accuracy, relevancy, timeliness and completeness of the information. The need for the information could be for application filings, environmental findings, historical research or more authoritative sources.

Scenario 5: Sharing law enforcement information across jurisdictional boundaries.

Police searching an apartment obtain handwritten notes in a foreign language, an apparent ledger of financial transactions, fingerprints and photos of unfamiliar graffiti on the apartment walls. They digitize the information and post it to the appropriate law enforcement information-sharing exchange. After receiving a notice about the posting, an investigator then translates and interprets the documents and photos, analyzes the materials, and correlates the information with other relevant information obtained from various law enforcement organizations at the federal, state, local and tribal levels.

Scenario 6: Tracking down forged identities.

A credit card company discovers that someone fraudulently established a series of accounts. The credit card company must notify the victims. The victims in turn must notify all financial organizations they use, including all governmental agencies from which they currently or potentially receive services.

Scenario 7: Allowing citizens to access to government information on a specific topic.

Someone is searching for all available federal information on a particular topic, including information located on government Web sites. A successful search will help the person avoid using the complex, lengthy and potentially costly Freedom of Information Act process. Agencies cannot determine an individual's interest in advance, but invariably the same, similar or related government information is located at more than one federal agency and comes in various types of online information. Some of those information resources are Deep Web or hidden Web assets and are not easily accessible using typical Internet search engines.

International information retrieval

The Open Content Alliance, a new worldwide collaboration of cultural, technology, nonprofit and governmental organizations, would like to help the U.S. government build a searchable, permanent archive of online government information.

Here is more information about the Open Content Alliance.

  • The alliance, hosted by the nonprofit Internet Archive, will be a digital repository of global content for universal access.
  • The online warehouse will offer digital versions of books, academic papers and video and audio files.
  • Yahoo will power the collection's search engine, but all content will be available for other major search engines to index.
  • Metadata for all content will be freely available to the public through formats such as the Open Archives Initiative Protocol for Metadata Harvesting and RSS.
  • Current participants include Adobe, Hewlett-Packard Labs, the United Kingdom's National Archives, O'Reilly Media, Prelinger Archives, all libraries at the University of California campuses and the University of Toronto.

-- Aliya Sternstein

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.