Archivists praise PDF/A
- By Florence Olsen
- Mar 10, 2004
NEW YORK -- An emerging PDF standard for archiving is a bright spot on the horizon for federal records managers, who said here this week that it could become an international standard by early next year.
The open standard PDF, created by Adobe Systems Inc., has become a widely used format for distributing documents on the Internet because it preserves their original look and makes it harder to copy and edit them. Now a modified version, called PDF-Archive (PDF/A), is viewed as one of two leading data-format candidates for preserving future access to electronic records and documents, according to a panel of federal records managers who spoke March 9 at the Association for Information and Image Management Expo, a trade show for the content and document management industry. Panelists said PDF/A gives archivists more control over how a document is stored than is possible with the regular PDF format.
The other contender as a viable format for electronic data archiving, the panelists said, is the Extensible Markup Language format, XML.
Both formats have their strengths and weaknesses for electronic data archiving. But the PDF/A is especially promising for documents that must be preserved for litigation and case law because it preserves the visual appearance of the formatted document, said panel member Stephen Levenson, the judiciary records officer for the Administrative Office of the U.S. Courts.
For records of legal proceedings, the position of paragraphs and footnotes by reference to the page number on which they appear in a printed document is crucial for understanding because attorneys rely on positional reference when they present their arguments, Levenson said. The PDF/A format derives from Adobe's PDF version 1.4 specification, to which the company has relinquished all proprietary rights in perpetuity. "That's a big step for that company to make this sort of commitment," Levenson said.
The PDF-Archive Committee, which Levenson heads, has avoided tackling the separate problem of long-term media storage. "Issues as to where we're going to store this stuff are unsettled at this point, although there are some very good groups that are starting to deal with that," he said.
The proposed PDF/A standard specifies what can and cannot be stored in the PDF/A file, by excluding proprietary encryption schemes and embedded files such as executable scripts. "We don't want embedded files that can do mischief inside our records collection," Levenson said.
Levenson's committee is hopeful that PDF/A will become a widely accepted standard that the industry would use in building software tools for archiving documents. Users could, for example, have a choice between saving a file as a PDF or as a PDF/A document, Levenson said. If they selected PDF/A, the software would then examine the file for elements that would need to be removed.
Levenson said it would be too much to expect users to go through all the steps necessary for archiving a PDF file. "This will essentially be an automated process," he said, "either submitted in batch for large-volume translations, or it will be interactive for the individuals who are composing content at the desktop."