GPO hunts fugitives
- By Florence Olsen
- May 20, 2004
GPO's digitization priorities survey
Government Printing Office officials are looking for so-called fugitive documents and plan on sending a Web crawler out to find them.
As more federal agencies publish government information on Web sites without notifying GPO, important documents that should be indexed, catalogued and preserved for public access in the Federal Depository Library Program have instead become "fugitive" documents, according to GPO officials.
Their answer to the problem is to use Web crawler and data-mining technologies to find them. GPO officials request that companies with those technologies submit proposals by June 2 for services they describe as "Web harvesting" in a recent solicitation for bids.
They are seeking a harvesting service that can locate and capture fugitive government publications in all possible formats, including HTML, PDF, and Microsoft Corp. Word and Excel. The service must be able to capture only those documents that conform to GPO's criteria and eliminate duplicates of documents that the agency already has in its databases, the solicitation states.
In a related effort to preserve access to government information, GPO officials are working with the Association of Research Libraries and other library groups on a plan to digitize printed government documents and, eventually, those stored on microfiche.
GPO officials have conducted a survey of libraries to seek recommendations about which government document titles and series should be given priority in the digitization plan. Next month, they will give library officials a consolidated list of documents and ask them to rank those according to which ones they would like to have digitized before others.