Crawling for content

Government Printing Office officials are investigating the use of Web-crawler and data-mining technologies to capture government information published on the Web.

They need technology that can:

Find and capture government information on the Web in any format.

Examine file content and any metadata associated with the file.

Follow rules for capturing government information and avoid capturing information that fails to conform to the rules.

Tolerate rule changes as GPO officials gain a better understanding of the types of electronic information they need to preserve.

Perform automated comparisons between newly captured government information and information already stored in GPO's electronic repository to eliminate duplication.

Source: Government Printing Officea

Featured

  • CLOUD
    pentagon cloud

    Court orders temporary block on JEDI

    JEDI, the Defense Department’s multi-billion-dollar cloud procurement, is officially on hold, according to a federal court announcement Feb. 13.

  • Defense
    mock-up of the shore-based Aegis Combat Information Center

    Pentagon focuses on research, cyber in 2021 budget request

    The Defense Department wants to significantly increase funds for research, cyber, and cloud.

Stay Connected

FCW INSIDER

Sign up for our newsletter.

I agree to this site's Privacy Policy.