Archive-friendly PDF in the works

Two of the largest bankruptcy filings in U.S. history — Enron Corp. and Global Crossing — produced a record number of PDF documents, which federal courts must figure out how to archive and preserve.

The archival challenges those bankruptcies created explain why Stephen Levenson, judiciary records officer for the Administrative Office of the U.S. Courts, is spending much of his time these days working with colleagues on a new international standard for archiving PDF documents.

The open-standard PDF, created by Adobe Systems Inc., has become a widely used format for distributing documents on the Internet because it preserves their original look and makes copying and editing them difficult. Now, a modified version, called PDF-Archive (PDF-A), to which Levenson is committed, most likely will become an international standard early next year. Companies are expected to immediately offer archiving aids based on the new standard.

Sitting around the table at PDF-A Committee meetings are representatives from companies such as Eastman Kodak Co., Global Graphics Software Ltd., IBM Corp., PDF Sages Inc. and Xerox Corp., said Melonie Warfel, director of worldwide standards at Adobe. But equally involved are representatives from federal agencies such as the Internal Revenue Service, the Library of Congress, and the National Archives and Records Administration.

The PDF-A standard will be a slimmed-down version of PDF, Levenson said. It will be useful for formatting document files that contain multiple pages of text, raster images or vector graphics. However, it will not be suitable for archiving music and video files, he said.

Among federal archivists and records managers, PDF-A is viewed as one of two leading data format candidates for preserving future access to electronic records and documents. The other is Extensible Markup Language. The proposed PDF-A standard specifies what should be stored in an archived file by prohibiting, for example, proprietary encryption schemes and embedded files such as executable scripts. "We don't want embedded files that can do mischief inside our records collection," Levenson said.

PDF-A is based on PDF 1.4, a version of the published and freely available PDF specification that is only slightly outdated. Adobe is at PDF Version 1.6 in its development of the specification. "We'll catch up if we need to," Levenson said. "But in this business of archival preservation, we don't need to go too fast."

Unlike a PDF, a PDF-A will contain type fonts to ensure that electronic documents will look the same in the future as they did when they were created, said Charles Dollar, an electronic records consultant who is chairman of the Standards Board of the Association for Information and Image Management, a nonprofit trade group.

"Typically, the type fonts exist independently of the PDF document," Dollar said. But with PDF-A, they will be embedded in the document. "That's going to increase the storage requirements," he said. But it is a price that must be paid to ensure that type fonts are available when they are needed for reading scientific notation, for instance.

As with any new standard, there is always a risk that too few companies will use it to create new software, but Dollar said he doubts that will be the case with PDF-A.

***

What's new with PDF-A

The PDF standard is popular throughout the federal government for electronic documents, but it is not suitable for archiving permanent records. For that purpose, officials at many federal agencies expect to use a new electronic document format called PDF-Archive (PDF-A). Here's how the two compare:

PDF

Nonarchival format.

Text, raster images, vector graphics, music, video, etc.

International Organization for Standardization (ISO) standard.

Encryption and executable scripts permitted.

No type fonts included.

PDF-A

Archival format.

Text, raster images and vector graphics only.

Future ISO standard.

Encryption and executable scripts not permitted.

Type fonts included.

The Fed 100

Read the profiles of all this year's winners.

Featured

  • Then-presidential candidate Donald Trump at a 2016 campaign event. Image: Shutterstock

    'Buy American' order puts procurement in the spotlight

    Some IT contractors are worried that the "buy American" executive order from President Trump could squeeze key innovators out of the market.

  • OMB chief Mick Mulvaney, shown here in as a member of Congress in 2013. (Photo credit Gage Skidmore/Flickr)

    White House taps old policies for new government makeover

    New guidance from OMB advises agencies to use shared services, GWACs and federal schedules for acquisition, and to leverage IT wherever possible in restructuring plans.

  • Shutterstock image (by Everett Historical): aerial of the Pentagon.

    What DOD's next CIO will have to deal with

    It could be months before the Defense Department has a new CIO, and he or she will face a host of organizational and operational challenges from Day One

  • USAF Gen. John Hyten

    General: Cyber Command needs new platform before NSA split

    U.S. Cyber Command should be elevated to a full combatant command as soon as possible, the head of Strategic Command told Congress, but it cannot be separated from the NSA until it has its own cyber platform.

  • Image from Shutterstock.

    DLA goes virtual

    The Defense Logistics Agency is in the midst of an ambitious campaign to eliminate its IT infrastructure and transition to using exclusively shared, hosted and virtual services.

  • Fed 100 logo

    The 2017 Federal 100

    The women and men who make up this year's Fed 100 are proof positive of what one person can make possibile in federal IT. Read on to learn more about each and every winner's accomplishments.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group