DHS updates on data mining

Shutterstock image (by Andrii_M): computer binary code. 

The Department of Homeland Security uses software tools to extract insights from its vast troves of data. Under federal law, DHS must make an annual report to Congress on its use of data mining to allay concerns about possible privacy violations.

The latest report, released publicly April 20, said that "no decisions about individuals are made based solely on data mining results" and that DHS investigators "apply their own judgment and expertise to bear in making determinations about individuals initially identified through data mining activities."

The data mining report, the associated privacy reports and record system notices together provide updates on how DHS is integrating its data systems across all its component agencies and its progress on its strategy to create a centralized "data lake" for investigators.

As of October 2016, DHS had wrangled 17 datasets into the DHS Data Framework. These include some of the large travel and immigration databases, including the I-94 system for foreign visitors, the Electronic System for Travel Authorization and the Passenger Name Record system.

The Framework is divided into two related systems --  a data lake called Neptune and a classified query system called Cerberus, which is used for counterterrorism probes. In 2016, according to the report, DHS tapped Cerberus to "facilitate bulk information sharing with U.S. government partners." In this context, "bulk" refers to data that isn't selected based on specific identifiers or other search terms "reasonably likely to exclude any intelligence or information not relevant to the need giving rise to the recipient's request."

DHS also noted that it was looking to replace an interim solution that allows users of the Framework to make classified queries to identify terror suspects linked to ISIS, al-Qaida and their affiliates to address the risk of "foreign fighters" entering the U.S.  According to the report, DHS "defined a set of operational requirements that the Data Framework must meet in order to fully replace the interim process."

A key goal of the Framework was to apply the "One DHS" policy to integrate and manage data across all sources. However, familiar issues of interoperability hamper the integration of systems. One planned feature -- keeping the data in the Framework coordinated with the data in the source systems -- had to be postponed. DHS "discovered that the source IT systems are not always able to accommodate" delete notifications from source systems, " due to a number of constraints, such as resources, legacy systems, and disruptions to operational support."

Accordingly, according to the report, an update to the data retention policies of the Framework will be addressed in a forthcoming privacy assessment.

The report also identified two new data mining systems. The Socrates pilot, administered by Customs and Border Protection, and the Fraud Detection and National Security Data System under the control of the U.S. Citizenship and Immigration Service. The Socrates pilot is being operated in conjunction with the Johns Hopkins University Applied Physics Laboratory and involves analyzing large international trade datasets to identify patterns of tariff avoidance, importation of counterfeit merchandise and other illicit trade activity. The longstanding Fraud Detection and National Security Data System, which tracks fraud in immigration applications, has added analytical capacity.

About the Author

Adam Mazmanian is executive editor of FCW.

Before joining the editing team, Mazmanian was an FCW staff writer covering Congress, government-wide technology policy and the Department of Veterans Affairs. Prior to joining FCW, Mazmanian was technology correspondent for National Journal and served in a variety of editorial roles at B2B news service SmartBrief. Mazmanian has contributed reviews and articles to the Washington Post, the Washington City Paper, Newsday, New York Press, Architect Magazine and other publications.

Click here for previous articles by Mazmanian. Connect with him on Twitter at @thisismaz.


  • Telecommunications
    Stock photo ID: 658810513 By asharkyu

    GSA extends EIS deadline to 2023

    Agencies are getting up to three more years on existing telecom contracts before having to shift to the $50 billion Enterprise Infrastructure Solutions vehicle.

  • Workforce
    Shutterstock image ID: 569172169 By Zenzen

    OMB looks to retrain feds to fill cyber needs

    The federal government is taking steps to fill high-demand, skills-gap positions in tech by retraining employees already working within agencies without a cyber or IT background.

  • Acquisition
    GSA Headquarters (Photo by Rena Schild/Shutterstock)

    GSA to consolidate multiple award schedules

    The General Services Administration plans to consolidate dozens of its buying schedules across product areas including IT and services to reduce duplication.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.