XML project centralizes agency stats

Anyone who wants to quickly access and search the Census Bureau's Statistical Abstract or the country profiles produced by the Central Intelligence Agency can now go to a collaborative government Web site that relies on XML

Anyone who wants to quickly access and search the Census Bureau's Statistical Abstract or the country profiles produced by the Central Intelligence Agency can now go to a collaborative government Web site that relies on XML to collect the information and make it available to the public.

The FedStats Web site, named by the Federal Interagency Council on Statistical Policy, provides a gateway to a wealth of statistics from more than 100 federal agencies. The site offers useful information including economic and population trends, health care costs, aviation safety, foreign trade, energy use, farm production and more — much of which has been locked away in paper form or in obscure electronic forms before now.

FedStats is using a new XML protocol and a new XML language from NextPage Inc. to enable it to build a true XML content network, said Brand Niemann, a computer scientist with the Center for Environmental Information and Statistics at the Environmental Protection Agency and a member of the FedStats Interagency Task Force.

FedStats is using NextPage's NXT 3 software, which includes the Content Network Protocol and Extensible Indexing Language, to power its network.

"The new protocol is used for sending XML messages between servers so you can have a series of content on a group of computers that looks centralized on one server when it really is distributed," Niemann said. "EIL integrates a hierarchal table of contents on a number of servers for distributed searching using XML for a list of results. XML in the Content Network Protocol and EIL are the glue that ties everything together."

In addition to the NextPage tools, FedStats is also using StatServer from Mathsoft Inc. and FileMaker from FileMaker Inc. for different components of the new site.

FedStats has already made six federal statistical reports available through its site and has plans to add more in the future, with the goal of having several hundred reports from myriad agencies accessible and searchable from one place, Niemann said. The key is that the reports can be written and maintained by an agency on its own systems, but still added to the FedStats content network.

Users can go to www.fedstats.net/ index.htm and perform searches using the new technology.

"What we've done and are going to do more of is make major federal statistical reports part of our content network," Niemann said. "The chapters and tables can reside wherever authored and then be integrated into a larger report seamlessly as if it was on one machine — distributed authoring and content [collection] rather than the manual aggregation they used to do."

FedStats has already used the XML-based system on the Census' Statistical Abstract of more than 1,500 tables of data from throughout the United States. Now, when anyone searches this immense report, they are searching not only the text in the document, but also the text in the tables, Niemann said.

The technology also allows reports and databases that were previously available only on CD-ROMs to be put online and added to the content network.

Word has spread about the project. "We're incorporating XML much more rapidly, where most agencies are just getting started," Niemann said. "This not only showed a new technology, but also got more agencies interested. The goal is to have as many agencies with their own content network nodes as possible."