Schwartz: Focusing on searchability
When it comes to online searches, people are concerned with what they aren't finding.
- By Ari Schwartz
- Jan 02, 2008
In March 2007, I turned to Google and Yahoo search engines to find a proposed government rule on whether polar bears should be on the endangered species list. Although I found old news releases from major environmental groups, I was surprised to find no hits that led me to a U.S. government Web site.
Those familiar with e-government would certainly expect a hit on Regulations.gov. Alas, nothing.
At least I should have found a hit on the Fish and Wildlife Service’s Web site leading me to the proposed rule. Yet, nothing.
Sadly, my experience isn’t a fluke.
Google estimates that more than 2,000 U.S. government data sources are invisible to users of its search engine. The Pew Internet Project showed that commercial search engines are by far the most popular means of finding government information.
Because of this finding, my organization and colleagues at OMB Watch felt the missing data situation highlighted critical gaps in online access to government information.
Our organizations released a study, “Hiding in Plain Sight,”which found that vast amounts of government information are invisible to the industry’s major search engines. The amount of hidden information is as troubling as the quality of hidden information.
For example, we found that the following agency resources had information obscured:
- Federal Emergency Management Agency databases. This includes a Flood Map Modernization project at FEMA, which shows flood hazards.
- Other Homeland Security Department databases. This includes topics such as environmental radiation monitoring.
- Federal Business Opportunities Web site database. This list has about 200 government business opportunities in the field of telecommunications.
- Central Contractor Registration database. The database lists who does business and receives money from the federal government.
- Federal Procurement Data Services database. This has data on all government contracts, including all telecom contracts.
- Smithsonian Institution resources. This includes many online content collections, including the Smithsonian Institution Research Information System.
We believe that most of this information is not available because of relatively minor technical obstacles that the agencies could — and should — quickly remedy.
In particular, these sites are either not site mapping their data using the industry standard Extensible Markup Language protocol or they are putting it behind directories listed in robots.txt files, which instruct search engines to voluntarily ignore certain areas of the site.
It is unclear whether these agencies know that their information is not publicly searchable and have not taken the adequate steps to change their practices or if the agencies do not know that the search engines are not indexing important information.
For agencies to solve the problem, they must first acknowledge it, and in this case, making information available is the only way to do that. Schwartz ([email protected]) is the deputy director of the Center for Democracy and Technology.