Digital standards come first

Before agencies digitize their records, LOC group must develop standards

There are no governmentwide standards for digitizing books, records, photos, maps and films or other analog materials. But federal agencies are working together to create standards for bringing millions of creative works into the digital world.

Representatives from the Library of Congress, the Government Printing Office, the National Archives and Records Administration, the Transportation Department and other organizations are establishing guidelines for a massive digitization project.

The Federal Digitization Standards Working Group of the National Digital Strategy Advisory Board (NDSAB) is developing governmentwide standards or guidelines that will help agencies preserve documents and other works and share them.

The board is part of the Library of Congress’ National Digital Information Infrastructure and Preservation Program (NDIIPP), whose purpose is to foster governmentwide collaboration and public/private consensus on standards for creating new digital works. Standards for digitizing various works would benefit librarians, archivists, researchers and businesses, said Michael Stelmach, who leads NDSAB’s Federal Digitization Standards Working Group. However, for agencies such as NARA, which is trying to digitize 9 billion federal records, best practices for digitizing paper documents are its most pressing need.

Digitizing a document requires making decisions about the type of equipment to use, the appearance of the digital representation and the format in which the digital document will be stored. Those decisions affect the usability, integrity and longevity of the digital document, officials say.

Records managers also must create metadata — the information describing a document’s technical specifications — and establish basic descriptive metadata about a particular document, said Amanda Wilson, DOT’s NDSAB working group representative and director of the agency’s digital National Transportation Library. “We are hoping to coordinate with state [and local] Department of Transportation libraries and other transportation agencies throughout the country.”

The standards that NDSAB develops most likely will resemble a menu offering choices among different specifications and types of equipment, she added.
NDSAB will post the draft standards for public comment.

“Basically there will be a platform of standards that says, ‘If you are going to come in and scan our stuff, here is where the bar is set,’ ” Stelmach said.
Agencies are only beginning what will be a long and expensive digitization project, but standardization will be good for businesses and agencies, said Scott Christensen, vice president of electronic production at iArchives. That company scanned newspapers for the Library of Congress’ National Digital Newspaper Program, a searchable public database of newspapers that had been available on microfilm.

“If it’s in a common format, it makes it that much quicker and easier to obtain,” Christensen said. Governmentwide guidelines would lower costs for vendors and open up the marketplace, he added.

The usability of digital documents depends largely on their metadata. However, some organizations also use optical character recognition software to search for key words in digitized documents. A reasonable rate for OCR accuracy is at least 90 percent, which in most cases is sufficient for good search results, Stelmach said.

Metadata describing the structural characteristics of an original document, such as type of work and volume number, are used for cataloging and searching. Many librarians use the Metadata Encoding and Transmission Standard (METS) to locate digital materials.

“From the point of view from an organization like the Library of Congress where [officials] are dealing with huge amounts of content, having standards for how materials are actually digitized is in
redibly important,” said Jerry McDonough, one of METS’ creators and assist
nt professor of library science at the University of Illinois at Urbana- Champaign.

“The sad reality is that we can make our own decision in the library community about how we want to digitize our analog material, but there are publishers out there who are going to have their own standards,” McDonough said. “They’re not driven by the same sort of preservation and access concerns as the library community is.”

About the Author

Ben Bain is a reporter for Federal Computer Week.

Who's Fed 100-worthy?

Nominations are now open for the 2015 Federal 100 awards. Get the details and submit your picks!

Featured

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above