Clarifications sought on data mining

Clarification is needed for the definition of data mining and the rules governing it, civil libertarians and academics said today.

Several experts at a Homeland Security Department conference on implementing privacy protections in government data mining expressed concerns that the meaning of data mining was misunderstood, or had not been fully explained, thus leading to confusion or potential violations of privacy rights.

In the legislation that established DHS, Congress required the department to “establish and utilize…a secure communications and information technology infrastructure, including data mining and other advanced analytical tools, in order to access, receive and analyze data.” However, according to some experts, there's confusion over what constitutes data mining, causing misperceptions.

Some experts were worried that the lack of an agreed-upon definition and specific rules governing different types of data mining, including the use of commercial data, increases the risk of privacy violations.

“What’s important here is that we not reflexively say that data mining is bad…but we need to have in place the rules of the road here…about when data can be collected, how it can be used,” said Barry Steinhardt, director of the Technology and Liberty Project at the American Civil Liberties Union. “We have not really had that discussion about what the rules of the road are.”

In its 2007 annual report to Congress on the department’s data mining activities, DHS' privacy office said that “it is important to note that no consensus exists on what constitutes ‘data mining.”

“In colloquial use, data mining generally refers to any predictive, pattern-based technology” the report said, adding that different government reports have used different definitions some more narrow than others.

However, the Data Mining Reporting Act, passed as part of a major anti-terrorism law in 2007, defines the term as “a program involving pattern-based queries, searches, or other analyses of one or more electronic databases” with a series of caveats. In February, DHS released a letter report on its use of data mining using this definition.

“Data mining means many things to many different people,” said David Jensen, an associate professor of computer science at the University of Massachusetts.

Jensen said definitions that portray data mining as a process of filtering or extraction are easy to understand, but also to misinterpret. He said more useful definitions explain that data mining is a process that involves making inferences based on probability ratings.

Christopher Slobogin, a law professor at Vanderbilt University, said even though data mining did not usually involve physical intrusion, if used incorrectly it could harm individuals' privacy rights.

Slobogin said people could be hurt by data mining if authorities used data-mining techniques in good faith but used bad information, or if they intentionally used data mining to do harm.

Fred Cate, director of the Indiana University’s Center for Applied Cybersecurity Research, said another danger of using data mining incorrectly is it could waste limited resources.

“The big challenge here is moving from the scientific world where data mining is used all the time with enormous effectiveness, into the political world — the reality in which we live," he said.

About the Author

Ben Bain is a reporter for Federal Computer Week.

Featured

  • Telecommunications
    Stock photo ID: 658810513 By asharkyu

    GSA extends EIS deadline to 2023

    Agencies are getting up to three more years on existing telecom contracts before having to shift to the $50 billion Enterprise Infrastructure Solutions vehicle.

  • Workforce
    Shutterstock image ID: 569172169 By Zenzen

    OMB looks to retrain feds to fill cyber needs

    The federal government is taking steps to fill high-demand, skills-gap positions in tech by retraining employees already working within agencies without a cyber or IT background.

  • Acquisition
    GSA Headquarters (Photo by Rena Schild/Shutterstock)

    GSA to consolidate multiple award schedules

    The General Services Administration plans to consolidate dozens of its buying schedules across product areas including IT and services to reduce duplication.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.