Building the digital library
- By Elana Varon
- Aug 23, 1998
While the World Wide Web has given federal agencies unprecedented access to historical, scientific and reference data, it has also created a new challenge: finding the tools that make sense of that information. Whether they are homing in on key intelligence data or supplying educational materials for teachers, agencies ultimately want the capability to pull text, audio, video and images from virtual shelves anywhere in the world as easily as one picks books from a library shelf today.
The Web "is what most people consider their digital libraries,'' said Michael Lesk, author of the recent book Practical Digital Libraries and director of the National Science Foundation division that manages the interagency Digital Libraries Initiative (DLI). Now curators of online collections are looking for ways to mine this information explosion.
"One of the classic functions of a library as an organization is they collect and acquire, they organize and make accessible, and they preserve,'' said Clifford Lynch, executive director of the Coalition for Networked Information, an interest group that represents universities and research libraries on technology issues. "If you look at many Web sites, they're about publishing information but not really so concerned about long-term retention and organization.''
Librarians in the Map and Imagery Laboratory at the University of California at Santa Barbara (UCSB) began looking for ways to offer greater accessibility to their holdings of more than 5 million topographical maps, satellite data, photographs and other resources more than a decade ago. Back then, "no one even knew what we were talking about,'' said Larry Carver, the lab's director, when the library's staff said they needed a way to catalog, search and distribute their data online. "The technology was not there yet.''
With help from federal grants, the university last month took the first step toward opening its holdings via the Internet. UCSB's Alexandria Digital Library (ADL) became the first link in an effort throughout California to provide electronic research materials on the Web— first within the state university system and, after a year of testing, to the public, including agencies, universities and private companies around the world.
Other government-backed projects, at universities and within federal agencies, are chasing related ends. Through the DLI, during the past four years, NSF, the Defense Advanced Research Projects Agency and NASA jointly have poured $24.4 million into ADL and five other academic projects that aim to make digital libraries as user-friendly as their physical counterparts.
Any organized collection of electronic documents that is set up "for human beings to use'' can be a digital library, according to NSF's Lesk, and some technologies to support those collections are well-established. Databases, software for capturing images, tools for searching and retrieving text, and CD-ROMs and networks for distributing data are widely used by federal agencies today.
But technologies for tapping sound and video files, or for parsing data in multiple formats, have begun to emerge only recently. Developing search tools, including better ways to tag and index data, has been a major focus of digital libraries research.
"I need to have ways to describe what I'm looking for,'' said Nand Lal, who oversees a NASA research project, called Digital Library Technology, that is separate from the joint effort with NSF. That means making satellite imagery and other space data currently organized for in-house use "intelligible'' on the Web for scientists and the general public. "It's the whole idea of universal access in the sense of being able to deliver things that are of interest to the user, not necessarily to the producer, using facilities and language and terminology that are adapted to the user,'' Lal said.
"Search and retrieval is no longer about text,'' said Mark Demers, director of marketing and corporate communications with Excalibur Technologies, which this week is releasing software for searching video archives. Demers thinks the software could be used by federal agencies to set up libraries of training materials, surveillance tapes and historical records. "It's about all assets everywhere, and it's even about metadata,'' which is the text or software codes used to index online materials. Robust, easy-to-use search tools are "an enabling technology that is almost at the core of a digital library system,'' Demers said.
"An ideal goal for these technologies would be to make them disappear,'' said Stephen Griffin, the DLI program director, who is preparing to award a new round of grants, totaling $40 million to $50 million, beginning this fall. "If these technologies were invisible to the user and the user could work directly with the [information],'' then the user could more easily create, and learn from, new virtual environments.
Meanwhile, agencies are building basic digital libraries with software already on the market. Brand Niemann, digital librarian with the Environmental Protection Agency's new Center for Environmental Information and Statistics (CEIS), recently collected more than five dozen links for the center's Web site— links that enable users to access reports and data about local, national and international environmental conditions. Users can search these links, or only a portion of them, using the Topic search engine from Verity Inc.
One feature that makes the site a digital library, Niemann said, is that it offers visitors a single query form to search many sources, even documents hosted on other agencies' Web sites. Niemann also has helped the U.S. Geological Survey develop a "Web-connected CD-ROM'' that gives users a set of documents they can use offline but that contains links to a Web site where users can obtain updates. The USGS application is based on digital publishing software from Folio Corp. that allows access to documents in different formats through a common interface.
Funding: The Biggest Barrier
Niemann said funding, more than technology, has limited what CEIS has been able to include in its online collection. "The content is endless. You'll never have it [all], so just like building the Web, you've got to get so many people involved.''
According to NSF's Lesk, economic constraints, together with legal obstacles faced by agencies that want to distribute copyrighted data, form the main barriers to setting up digital libraries.
"Government libraries hold lots of materials to which they don't have the intellectual-property rights,'' said Bob Zich, director of electronic programs with the Library of Congress' National Digital Library program. Legislation pending in Congress aims to set rules governing copyrights online, but librarians and researchers, including an LOC official, have testified that these proposals could hamper public access to electronic materials in library collections.
Accessibility is another hurdle that agencies face. LOC started distributing copies of historic photographs and documents on disc eight years ago and now makes about 500,000 images, maps, film clips, audio files and texts available through its American Memory Web site. The goal of the $60 million project, which is funded mainly through private donations, is to provide broader public access to historic and cultural artifacts from library collections around the country— collections that otherwise would be accessible only by visiting the places where they are stored.
But although software such as Real-Audio and QuickTime theoretically put sound and video clips within reach of anyone with Internet access, Zich noted that unless someone has a high-speed link, he might not have the time or the patience to download these files.
"We are waiting for when millions of people will have real wideband access," Zichsaid. "We have some films of the  San Francisco earthquake that are 100M.''
Digital librarians also want more robust tools behind their collections' home pages. One such tool, under development by the NSF-backed San Diego Supercomputing Center and IBM Corp., offers a method for retrieving documents from different storage platforms. Much scientific data is stored as flat files, said Chaitanya Baru, senior principal scientist for enabling technologies at SDSC. "We haven't seen too many people trying to address this issue of trying to get to data on heterogeneous storage devices,'' he added.
The project, part of a test bed for a paperless system to apply for patents, integrates IBM's High Performance Storage System, which is used to manage large data files for supercomputing applications, with a relational database. To find files, users query a database that holds a metadata directory, and a middleware application that the team has developed communicates the query to remote storage systems.
There is no single technology that appears to be driving the deployment of digital libraries today. But researchers and industry experts agree on the ultimate goal: that people must be able to find what they seek.
"Things can be put in formats that are lost forever,'' said Eugene Miya, a NASA electronics engineer who is reviewing DLI grant proposals. "If you don't have the protocols and formats [to retrieve it], your data is just about useless."
AT A GLANCE
Web Info on Digital Libraries
* NSF/DARPA/NASA Digital Libraries Initiativewww.cise.nsf.gov/iis/dli_home.html(includes links to the Alexandria Digital Library and other current research projects)
* Digital Libraries Initiative II program announcement www.nsf.gov/pubs/1998/nsf9863/nsf9863.htm
* Library of Congress American Memory Project/National Digital Library Program lcweb.loc.gov
* Library of Congress Internet Resource Pagelcweb.loc.gov/loc/ndlf/digital.html(includes links to other digital library information)
* EPA Center for Environmental Information and Statisticswww.epa.gov/ceis