Energy launches megasearch effort

Internet users will be able retrieve scientific research from four major portals by entering one query on one Web site.

A year from now, Internet users will be able retrieve scientific research from four major portals – Science.gov, a portal with millions of pages of pre-publication research findings, a search engine for science conference proceedings and a database of international research on energy – by entering one query on one Web site.

The megaportal initiative, dubbed Global Discovery, is a program of the Energy Department’s Office of Scientific and Technical Information. OSTI already hosts Science.gov, the most established tool for searching across disparate systems, data types and agencies. Since 2002, a dozen agencies have contributed to the portal, offering one-stop access to PubMed, MedlinePlus and DefenseLINK, among other databases.

Search engines such as Science.gov that can query databases in one shot are called metasearch engines. Traditional search engines, such as Google, never see all the information stored in databases.

Metasearch, also known as federated search, has one drawback that is complicating OSTI’s Global Discovery effort. It takes a lot of time for any technology to trigger multiple simultaneous queries of databases, the Web and site-specific search engines -- and then combine the search results.

“There’s limitations on the scalability of these metasearch techniques,” OSTI Director Walter Warnick said today. “The idea is to integrate, in effect, all kinds of databases, ours plus the others.” He announced the Global Discovery initiative last week at the American Association for the Advancement of Science’s annual meeting in St. Louis.

Today Warnick said the combination of these four portals is only the beginning of a long process. Analysts estimate that the majority of Web-accessible scientific documents are in databases, inaccessible to traditional search engines. To simultaneously search these databases and then integrate the results, software programmers cannot simply combine repositories. The formats and sizes of the databases vary widely.

OSTI, in collaboration with Science.gov’s participating agencies and the search software company Deep Web Technologies, is working to enhance the precision of search results from distributed resources and overcome the scalability constraints.

A year from now, Warnick expects to offer a portal that can search Science.gov, two of OSTI’s other search portals and the International Energy Agency’s Energy Technology Data Exchange, which contains a growing collection of more than 3,675,000 bibliographic records and more than 152,000 full text documents.

OSTI’s E-print Network search portal scours 20,000 Web sites and databases worldwide for e-prints, which are pre-publication scientific articles and scholarly papers that are circulated electronically, so that colleagues can share early research findings. The office’s new Science Conferences portal aggregates the conference proceedings from 16 databases run by scientific societies, Energy facilities and national labs.

Global Discovery’s goal is to accelerate scientific advancement by diffusing new research.

“The spread of new ideas in science is mathematically similar to the spread of disease, even though one produces positive results, the other negative,” Warnick said at the AAAS meeting. “Our goal is to foster epidemics of new knowledge by speeding the diffusion of new ideas.”

NEXT STORY: NASCIO steps up health IT efforts