Archive-friendly PDF in the works

Two of the largest bankruptcy filings in U.S. history — Enron Corp. and Global Crossing — produced a record number of PDF documents, which federal courts must figure out how to archive and preserve.

The archival challenges those bankruptcies created explain why Stephen Levenson, judiciary records officer for the Administrative Office of the U.S. Courts, is spending much of his time these days working with colleagues on a new international standard for archiving PDF documents.

The open-standard PDF, created by Adobe Systems Inc., has become a widely used format for distributing documents on the Internet because it preserves their original look and makes copying and editing them difficult. Now, a modified version, called PDF-Archive (PDF-A), to which Levenson is committed, most likely will become an international standard early next year. Companies are expected to immediately offer archiving aids based on the new standard.

Sitting around the table at PDF-A Committee meetings are representatives from companies such as Eastman Kodak Co., Global Graphics Software Ltd., IBM Corp., PDF Sages Inc. and Xerox Corp., said Melonie Warfel, director of worldwide standards at Adobe. But equally involved are representatives from federal agencies such as the Internal Revenue Service, the Library of Congress, and the National Archives and Records Administration.

The PDF-A standard will be a slimmed-down version of PDF, Levenson said. It will be useful for formatting document files that contain multiple pages of text, raster images or vector graphics. However, it will not be suitable for archiving music and video files, he said.

Among federal archivists and records managers, PDF-A is viewed as one of two leading data format candidates for preserving future access to electronic records and documents. The other is Extensible Markup Language. The proposed PDF-A standard specifies what should be stored in an archived file by prohibiting, for example, proprietary encryption schemes and embedded files such as executable scripts. "We don't want embedded files that can do mischief inside our records collection," Levenson said.

PDF-A is based on PDF 1.4, a version of the published and freely available PDF specification that is only slightly outdated. Adobe is at PDF Version 1.6 in its development of the specification. "We'll catch up if we need to," Levenson said. "But in this business of archival preservation, we don't need to go too fast."

Unlike a PDF, a PDF-A will contain type fonts to ensure that electronic documents will look the same in the future as they did when they were created, said Charles Dollar, an electronic records consultant who is chairman of the Standards Board of the Association for Information and Image Management, a nonprofit trade group.

"Typically, the type fonts exist independently of the PDF document," Dollar said. But with PDF-A, they will be embedded in the document. "That's going to increase the storage requirements," he said. But it is a price that must be paid to ensure that type fonts are available when they are needed for reading scientific notation, for instance.

As with any new standard, there is always a risk that too few companies will use it to create new software, but Dollar said he doubts that will be the case with PDF-A.


What's new with PDF-A

The PDF standard is popular throughout the federal government for electronic documents, but it is not suitable for archiving permanent records. For that purpose, officials at many federal agencies expect to use a new electronic document format called PDF-Archive (PDF-A). Here's how the two compare:


Nonarchival format.

Text, raster images, vector graphics, music, video, etc.

International Organization for Standardization (ISO) standard.

Encryption and executable scripts permitted.

No type fonts included.


Archival format.

Text, raster images and vector graphics only.

Future ISO standard.

Encryption and executable scripts not permitted.

Type fonts included.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.


  • Anne Rung -- Commerce Department Photo

    Exit interview with Anne Rung

    The government's departing top acquisition official said she leaves behind a solid foundation on which to build more effective and efficient federal IT.

  • Charles Phalen

    Administration appoints first head of NBIB

    The National Background Investigations Bureau announced the appointment of its first director as the agency prepares to take over processing government background checks.

  • Sen. James Lankford (R-Okla.)

    Senator: Rigid hiring process pushes millennials from federal work

    Sen. James Lankford (R-Okla.) said agencies are missing out on younger workers because of the government's rigidity, particularly its protracted hiring process.

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group