To filter or not to filter: That's a big question

The promise of big data techniques is their ability to help organizations manage and sift through huge volumes of data quickly.

If you take the phrase “big data” at face value and the fact that there are tools that allow you to sift through huge amounts of data to get information out of them, then the assumption should surely be that the more data you can store the better the system works.

However, that’s a matter of some argument. The traditional approach would be to filter the data in some way, given that the experience up to now has been that it is expensive and time-consuming to store and manage data. Does big data change that?

“I do believe that agencies need to closely scrutinize what value may be in their data,” said Matthew Martin, a solutions architect at Merlin International, an IT solutions provider. “With big data, however, the approach should be to keep everything because that analytical significance may only emerge over time.”

In the past, doing that was both costly and time-consuming, he said. Therefore, filtering data was a decent tactic to follow when the goals could be clearly identified.

“But we’re looking at a different paradigm now,” Martin said, “because we now have a way to derive value from the full data stream in a cost-effective way.”

Dale Wickizer, chief technology officer for the U.S. Public Sector at NetApp Inc., sees it differently. Ultimately, part of the intelligence of big analytics is to filter the data and for agencies to be more proactive in what they let into the big data process, he said. That’s particularly important if the predictions of a 50-fold increase in the amount of data by 2020 come true because the same IDC survey that forecast that rise only predicted a 50 percent increase in IT staff over the same period.

“How do you reconcile that?” Wickizer asked. “NetApp has been effective in working with its customers to figure out how to keep the same headcount and manage 20, 30 or 40 times the amount of data. But at some point, it’s going to fall over.”

Most people are afraid to throw things away for compliance or security reasons, he said, but eventually, they’ll have to be more judicious. “I think that’s going to have to be part of the big analytics market,” he added.

The dilemma organizations are facing is a real one, said Peter Doolan, group vice president and chief technologist for Oracle Public Sector. Traditionally, getting value out of data has been a laborious process, with people having to extract it via a data warehouse looking for trends. As a result, only a tiny percentage of the actual data was used.

But big data techniques turn that equation on its head. They allow questions to be asked of the raw data, and the fidelity of the information derived is much greater.

As it stands now, Doolan said he doesn’t believe many of his government customers understand the real value of the data they have, and the big data movement will progress in discrete stages. The first will be when agencies put the infrastructure together to capture data, and once they do that, they will begin to recognize the value of what they are doing.

“Then, as they develop the skills they need for big data, they’ll open up more of this framework to capture the data that today they are not” capturing, he said. “But we’re several years away from our customers even realizing the scope of this. They’re just getting started.”

In the end, it will likely come down to how agencies decide to use big data technologies and techniques. If, as most people expect, they use the tools as a complement to the data warehousing and data analysis they already have in place, to filter or not to filter will depend on their goal.

“I think there’s room for both approaches because each has equal value” depending on how big data is used, said Cameron Chehreh, chief enterprise engineer in General Dynamics Information Technology’s Intelligence Solutions Division. “What it comes down to is the mission requirement and use case. That’s what really drives strategy in how you look at data.”

About this Report

This report was commissioned by the Content Solutions unit, an independent editorial arm of 1105 Government Information Group. Specific topics are chosen in response to interest from the vendor community; however, sponsors are not guaranteed content contribution or review of content before publication. For more information about 1105 Government Information Group Content Solutions, please email us at [email protected]