Software conducts Deep Web searches
- By Colleen O'Hara
- Jun 30, 2003
Officials at BrightPlanet Corp. believe the three-year-old company has what the government ordered: technology that will help agencies locate information stored on Web servers that is not found by traditional search engines or Web crawlers.
Technologies that provide access to data stored on the so-called Deep Web and help agencies manage and store digital data should serve agencies well in the era of information overload, said Michael Bergman, chief technology officer of the Sioux Falls, S.D.-based company.
"What's easy to find is the doorstep into these sites," he said. "What's impossible to find is their content." The amount of content on the Deep Web is 500 times greater than that visible to conventional search engines, he wrote in a September 2001 white paper.
Bergman also reported that the 60 largest known Deep Web sites contain about 750 terabytes of data. Many of those are government sites, including Government Printing Office Access, the National Climatic Data Center at the National Oceanic and Atmospheric Administration, and NASA's Earth Observing System Data and Information System.
BrightPlanet has two main capabilities designed to help users locate and manage this data. Deep Query Manager 2 enables agencies to find content on Web sites, including Deep Web and government sites, and store and manage it as files that can then be archived, organized, shared and compared.
The Automated Information Portal (AIP), geared toward existing Web portals and offered as a hosted service, organizes Web documents, Deep Web databases and existing Web content into a searchable portal.
BrightPlanet opened a sales office in Washington, D.C., in March to target Fortune 100 customers and federal agencies, said John Fry, director of strategic account sales at that office.
The company has federal and state customers of its own — South Dakota Live, the official site for all South Dakota government documents, uses AIP — and is in discussions with others, including the Homeland Security Department, but is interested in partnering.
Lt. Cmdr. Andrew Chester, NATO open source intelligence program coordinator, said the alliance's intelligence organization has been using Deep Query Manager for about two years.
NATO likes Deep Query Manager because it resides on the server. "Our analysts can get access to it anywhere on Earth," Chester said. It also allows a distributed workforce to work collaboratively, he said. "I can save results in Norfolk [Va.] that can be viewed by colleagues in Brussels," Belgium.
Needle in the online haystack
Deep Web sites contain dynamic content served up in real time from a database in response to a direct query. In contrast, "surface" Web pages are static and linked to other pages.
BrightPlanet Corp. has technology that can help agencies find information stored on Deep Web sites. Government sites that are considered Deep Web include the National Climatic Data Center at the National Oceanic and Atmospheric Administration and Government Printing Office Access.