Crawling for content

Government Printing Office officials are investigating the use of Web-crawler and data-mining technologies to capture government information published on the Web.

They need technology that can:

Find and capture government information on the Web in any format.

Examine file content and any metadata associated with the file.

Follow rules for capturing government information and avoid capturing information that fails to conform to the rules.

Tolerate rule changes as GPO officials gain a better understanding of the types of electronic information they need to preserve.

Perform automated comparisons between newly captured government information and information already stored in GPO's electronic repository to eliminate duplication.

Source: Government Printing Officea

Featured

  • Workforce
    Avril Haines testifies SSCI Jan. 19, 2021

    Haines looks to restore IC workforce morale

    If confirmed, Avril Haines says that one of her top priorities as the Director of National Intelligence will be "institutional" issues, like renewing public trust in the intelligence community and improving workforce morale.

  • Defense
    laptop cloud concept (Andrey Suslov/Shutterstock.com)

    Telework, BYOD and DEOS

    Telework made the idea of bringing your own device a top priority as the Defense Information Systems Agency begins transitioning to a permanent version of the commercial virtual remote environment.

Stay Connected