Agencies team on biodata project
- By Brian Robinson
- Mar 14, 1999
Several agencies are combining efforts on a program that could lead to one of the most ambitious data-gathering projects ever attempted, with the potential to become a global project.
The goal of the U.S. program, which its backers hope eventually might be funded at $40 million a year, is to build a "second generation" National Biological Information Infrastructure (NBII-2). It would use leading-edge software tools and technology to automatically gather and collate much of the data produced in the United States on ecosystems and biodiversity and make that data instantly available to researchers over a high-speed computer network.
Bringing together this kind of biological information is not a simple case of co-locating the databases at a particular site on the Internet, according to Meredith Lane, a professor at the University of Kansas and executive director of a study conducted by the President's Committee of Advisors on Science and Technology.
Plans for NBII-2 are the result of a report issued in March 1998 from the PCAST study. Titled "Teaming With Life: Investing in Science to Understand and Use America's Living Capital," the report was the outcome of a request by President Clinton for studies of science and technology issues that had not been given their due in the administration's first term.
"Biodiversity and ecosystem information is inherently more complex than other kinds of biodata," Lane said. "And it's far more complex than chemical or physical data that can be based on math and numbers. A lot of biodiversity data is verbal, for example, so that poses new challenges for computer professionals."
The current NBII was created several years ago as a way to make previously remote and hard-to-access federal, state and private databases available to researchers. Its existence was put in doubt when Republicans in Congress challenged the validity of the National Biological Service (NBS) several years ago. The NBS eventually was folded into the U.S. Geological Survey.
The current NBII only allows one database at a time to be accessed, and collating and correlating data from multiple databases takes many hours of human involvement. Because of the limited time that most scientific investigators and managers have to do that work, NBII-2 will need to do all these data manipulations automatically in order to be most effective.
That, in turn, will require the use of advanced software tools and the application of new metadata standards. Metadata is essentially data about data - that is, high-level information that serves as a guide to a larger, more complex set of data.
"It's about bringing data from a...range of sources together and making it more useful to people," said Gladys Cotter, assistant chief biologist for information at USGS. "We need to have it all interlinked and to use some kind of expert system that will make it more available to users."
One essential component of NBII-2 already is well-developed. A "biological profile" that will be used to describe all types of biological data sets and information is expected to be included this year as an extension of the Federal Geographic Data Committee geospatial metadata standard.
A fundamental part of the plans for NBII-2 is construction of a national five-node, high-speed backbone network. Ideally, the nodes would be located at institutions that already have high-end computing facilities, such as supercomputer centers, and where researchers already are working on the kinds of leading-edge informatics - that is, biological data analysis - issues applicable to NBII-2. The links between the nodes "would have the highest bandwidth connections possible," Cotter said.
USGS has asked for an initial increase of $1 million in its fiscal 2000 budget proposal to begin work on this part of NBII-2. "That would at least allow us to begin installation of up to two nodes - one on the East Coast and one on the West Coast," said Anne Frondorf, the national program manager for NBII. "We would try to identity specific locations and institutions where we could start the nodes and try to create some focused NBII-2 effort there."
Other agencies, such as the National Science Foundation, are expected to help fund the program, but so far only USGS has earmarked dollars for NBII-2.
Cotter and James Edwards, the deputy assistant director for biological sciences at NSF, co-chair the Biodiversity and Ecosystem Informatics Working Group of the Subcommittee on Ecological Systems, which is part of the White House's Committee on Environment and Natural Resources. The working group will produce a model for government agencies to help them build into their future budgets items that relate to NBII.
A PCAST subcommittee also has been formed to promote the recommendations of the 1998 report throughout agencies.
The profile of NBII-2 also could be raised when ministers of the member countries of the Organization for Economic Cooperation and Development (OECD) meet in June. One of the items on their agenda will be a proposal to form the Global Biodiversity Information Facility (GBIF), which will link various nations' biological database networks in a global system. In the United States, information from these foreign databases would be made available to researchers through NBII.
If just five OECD members vote to go ahead with the GBIF - something people close to the program believe is almost certain - then each of the members would be levied a $600,000 fee, and a secretariat would be formed by June 2000.
-- Robinson is a free-lance writer based in Portland, Ore. He can be reached at [email protected]
Brian Robinson is a freelance writer based in Portland, Ore.