Crawling for content

Government Printing Office officials are investigating the use of Web-crawler and data-mining technologies to capture government information published on the Web.

They need technology that can:

Find and capture government information on the Web in any format.

Examine file content and any metadata associated with the file.

Follow rules for capturing government information and avoid capturing information that fails to conform to the rules.

Tolerate rule changes as GPO officials gain a better understanding of the types of electronic information they need to preserve.

Perform automated comparisons between newly captured government information and information already stored in GPO's electronic repository to eliminate duplication.

Source: Government Printing Officea

Featured

  • Defense
    The Pentagon (Photo by Ivan Cholakov / Shutterstock)

    DOD CIO hits pause on JEDI cloud acquisition

    Dana Deasy set cloud as his office's top priority. But when it comes to the JEDI request for proposal, he's directed staff to "pause" to compile a comprehensive review.

  • Cybersecurity
    By Gorodenkoff shutterstock ID 761940757

    Waging cyber war without a rulebook

    As the U.S. looks to go on the offense in the cyber domain, critical questions remain unanswered around who will take the lead and how clearly to draw the rules of engagement.

  • Government Innovation Awards
    Government Innovation Awards - https://governmentinnovationawards.com

    Deadline extended for Rising Star nominations

    You now have until July 18 to help us identify the early-career innovators and change agents in government IT.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.