Web research turns new page

Group readies protocol to improve sharing of Web-based research information

Categorizing documents

The Object Reuse and Exchange protocol is designed to provide a standard way to describe aggregations of online resources in all forms. They include:

  • Multipage HTML documents in which the pages are linked together by hyperlinks.
  • Semantically linked groups of cellular images.
  • Journals that aggregate multiple scholarly publications.
  • Information available from popular online social networking sites, such as YouTube and MySpace.

– Ben Bain

A new protocol for organizing electronic archives could make it easier for organizations to search and exchange Web-based resources.

The Open Archives Initiative (OAI) has already developed a similar standard that facilitates libraries, educators and scholars sharing electronic copies of research papers.

The new protocol, named Object Reuse and Exchange (OAI-ORE) could serve as a model for other communities that want an automated way to categorize vast collections of Web-based resources, including text, images, data and video.

The group’s original standard, the OAI Protocol for Metadata Harvesting (OAI-PMH) released in 1999, is widely used by government, research and education institutions to electronically disseminate and share information about research papers. OAI officials hope the group’s latest efforts will have a similar effect for Web-based research resources.

“A criticism of the 1999 work was that it was sort of library-centric, not Web-focused enough,” said Simeon Warner, a technical adviser to OAI and one the new standard’s editors. “So one of the things we’ve moved to for the newer work is to really place it within a Web framework, so we [are] adopting pretty broadly used standards.”

Warner said the new protocol seeks to go deeper than the Web’s high-level uniform resource identifiers by providing a standard way to sort through aggregations or groupings of Web resources. There is now no structured way to describe the compound digital objects that are part of an online collection, he said.

The group’s previous work with the OAI-PMH standard has made it much easier for institutions to exchange information electronically, said Jeff Given, the information technology manager for the Energy Department’s Office of Scientific and Technical Information. OSTI disseminates U.S. and international scientific, technical, and research and development findings.

Before OAI-PMH was developed, OSTI and other research agencies had to call one another to identify information of interest, then agree on the format and manner for sharing the data. He said OSTI and others now use OAI-PMH to automate that entire process.

Warner heads the arXiv print archive based at Cornell University. Metadata gathered through the OAI-PMH framework is used to help share the arXiv project’s roughly 500,000 submissions in physics, mathematics, computer science, quantitative biology and statistics, he said.

Standards such as OAI-PMH and the new OAI-ORE are helping to shift the focus from building the most comprehensive repository to how to make repositories more interoperable.  

“One of the ways we see repositories evolving is that people aren’t trying to build the be-all, end-all, one great portal,” he said. “The goal of putting up a repository of data now is not only to provide a direct way for users to get at it; it’s to interact with all of the other systems that are out there.” 

About the Author

Ben Bain is a reporter for Federal Computer Week.


  • Government Innovation Awards
    Government Innovation Awards - https://governmentinnovationawards.com

    Congratulations to the 2021 Rising Stars

    These early-career leaders already are having an outsized impact on government IT.

  • Acquisition
    Shutterstock ID 169474442 By Maxx-Studio

    The growing importance of GWACs

    One of the government's most popular methods for buying emerging technologies and critical IT services faces significant challenges in an ever-changing marketplace

Stay Connected