Clarifications sought on data mining
Clarification is needed for the definition of data mining and the rules governing it, civil libertarians and academics said today.
Several experts at a Homeland Security Department conference on implementing privacy protections in government data mining expressed concerns that the meaning of data mining was misunderstood, or had not been fully explained, thus leading to confusion or potential violations of privacy rights.
In the legislation that established DHS, Congress required the department to “establish and utilize…a secure communications and information technology infrastructure, including data mining and other advanced analytical tools, in order to access, receive and analyze data.” However, according to some experts, there's confusion over what constitutes data mining, causing misperceptions.
Some experts were worried that the lack of an agreed-upon definition and specific rules governing different types of data mining, including the use of commercial data, increases the risk of privacy violations.
“What’s important here is that we not reflexively say that data mining is bad…but we need to have in place the rules of the road here…about when data can be collected, how it can be used,” said Barry Steinhardt, director of the Technology and Liberty Project at the American Civil Liberties Union. “We have not really had that discussion about what the rules of the road are.”
In its 2007 annual report to Congress on the department’s data mining activities, DHS' privacy office said that “it is important to note that no consensus exists on what constitutes ‘data mining.”
“In colloquial use, data mining generally refers to any predictive, pattern-based technology” the report said, adding that different government reports have used different definitions some more narrow than others.
However, the Data Mining Reporting Act, passed as part of a major anti-terrorism law in 2007, defines the term as “a program involving pattern-based queries, searches, or other analyses of one or more electronic databases” with a series of caveats. In February, DHS released a letter report on its use of data mining using this definition.
“Data mining means many things to many different people,” said David Jensen, an associate professor of computer science at the University of Massachusetts.
Jensen said definitions that portray data mining as a process of filtering or extraction are easy to understand, but also to misinterpret. He said more useful definitions explain that data mining is a process that involves making inferences based on probability ratings.
Christopher Slobogin, a law professor at Vanderbilt University, said even though data mining did not usually involve physical intrusion, if used incorrectly it could harm individuals' privacy rights.
Slobogin said people could be hurt by data mining if authorities used data-mining techniques in good faith but used bad information, or if they intentionally used data mining to do harm.
Fred Cate, director of the Indiana University’s Center for Applied Cybersecurity Research, said another danger of using data mining incorrectly is it could waste limited resources.
“The big challenge here is moving from the scientific world where data mining is used all the time with enormous effectiveness, into the political world — the reality in which we live," he said.
Ben Bain is a reporter for Federal Computer Week.