The mosaic effect and big data
- By Adam Mazmanian
- May 13, 2014
The proliferation of government data sets is providing developers with ample fodder for writing useful and potentially profitable applications around census, weather, health, energy, business, agricultural and other information. But as the government makes more and more data discoverable and machine readable, there is the threat that disparate threads can be pieced together in a way that yields information that is supposed to be private.
This kind of analysis through the combination of big data sets is called the mosaic effect. And it isn't necessarily bad, Marion Royal, director of Data.gov at the General Services Administration, said at a May 13 FOSE session. He noted, for example, that the combination of big data sets can supply clues on the paths of seasonal flu outbreaks. But there is also the potential for a bad guy to, say, use transportation data and energy production data to figure out where oil and gas are moving on trains and trucks.
The White House publicly released its Open Data Action Plan on May 9, the one-year anniversary of President Barack Obama's executive order that made open data the default setting of the federal government. According to Royal, the government has found "very few instances of agencies putting up data with sensitivities."
The action plan aggregates planned release schedules for agency data sets, including information on health, climate, small business and manufacturing opportunities, crime, education, and public domain information on the federal workforce.
While the government is taking steps to reduce the exposure of personally identifiable information or security threats, the lingering problem is that it is impossible to scope out all the potential future uses of government datasets in advance, said David E. McClure, who works on open data at the National Oceanic and Atmospheric Administration.
"We know there's undiscovered value and unrecognized threats," McClure said. "We need to have some way to manage it and the short answer is, I don't know how to."
Royal suggested that the model of preserving privacy by individual consent might be obsolete when so much data is passively captured by sensors, and the abundance of social media and search data collected by private companies makes anonymization "virtually impossible," he said: "Privacy as a concept is becoming less clear as technology increases and big data becomes more prevalent, and available."
Adam Mazmanian is a staff writer covering Congress, the FCC and other key agencies. Connect with him on Twitter: @thisismaz.