Text removal made fast and simple
- By Paul Ferrill
- Sep 08, 2003
Many government agencies are searching for an automated solution to help them deal with new regulations covering the release of information. The problem gets even more complex when the documents are in formats other than Microsoft Corp.'s Word and include things such as rotated text or images.
Fortunately, Appligent Inc.'s Redax 3.5 specifically addresses the problem of removing selected text and images from Adobe Systems Inc.'s Acrobat PDF files.
After installing the program, a new menu option Redax appears at the top of the Adobe Acrobat screen. Prior to using the plug-in, you must go through a setup process to configure the default settings. You must also create a text file with exception words that will be redacted from prospective documents. This file will include specific words to look for as well as a code to denote the type of redaction that will occur. Redax ships with two code lists derived from the Freedom of Information Act and the U.S. Privacy Act.
There is also a manual method of text removal involving searching the document in Adobe Acrobat and marking text you wish to have excluded from the released version. This process uses a pop-up palette of exemption codes, making it possible to associate a specific reason for removal with each marked section.
Redax will not detect words from scanned documents or images in PostScript format. Using tools such as the Find Text Areas and Find Image Areas helps identify these potential problems so they don't go undetected. Once found, they can be manually marked for exclusion.
Redax templates make it easy to deal with forms by having information in the same location. A template can be applied across any number of forms to
remove a specific field on each one. To create a template, simply mark the areas to exclude from one document and then choose the Export Redax Template option from the menu. To apply a template to other documents, use the Import Redax Template option. The only downside is that you must either combine or manually load all of the pages that need to be processed into a single document.
Generating a detailed report of each item removed from the document is a useful tool for determining how many items were identified. The Report option generates a tab-delimited file containing the page number, creation date and time, color, exemption code, author as defined in Redax's preferences and any note associated with it.
One caveat from the document-
ation encourages users to "check each redaction individually" for missed
exemptions caused by typographical
errors, hyphenation or other irregularities. That could be tedious for large
The documentation warning highlights the point that no computer program is 100 percent accurate in performing the redaction process. Human intervention is still necessary to give the final product a thorough examination. Redax provides the tools necessary to help automate the release process as much as possible and to document what information was removed. The final results will depend a lot on the person operating the tool.
Ferrill, based in Lancaster, Calif., has been writing about software for almost 15 years. He can be reached at [email protected].
Using Appligent Inc.'s Redax, users can:
* Remove text and images.
* Replace removed text with text characters and images with black pixels or blank space.
* Customize removal
* Comply with Freedom of Information Act and Privacy Act rules.