Crawling for content

Government Printing Office officials are investigating the use of Web-crawler and data-mining technologies to capture government information published on the Web.

They need technology that can:

Find and capture government information on the Web in any format.

Examine file content and any metadata associated with the file.

Follow rules for capturing government information and avoid capturing information that fails to conform to the rules.

Tolerate rule changes as GPO officials gain a better understanding of the types of electronic information they need to preserve.

Perform automated comparisons between newly captured government information and information already stored in GPO's electronic repository to eliminate duplication.

Source: Government Printing Officea

Featured

  • Defense
    Ryan D. McCarthy being sworn in as Army Secretary Oct. 10, 2019. (Photo credit: Sgt. Dana Clarke/U.S. Army)

    Army wants to spend nearly $1B on cloud, data by 2025

    Army Secretary Ryan McCarthy said lack of funding or a potential delay in the JEDI cloud bid "strikes to the heart of our concern."

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

Stay Connected

FCW INSIDER

Sign up for our newsletter.

I agree to this site's Privacy Policy.