Pro tips for using big data

Ignore fads and focus on what the agency is looking to achieve, says Recovery Accountability and Transparency Board CIO Shawn Kingsberry.

Placeholder Image for Article Template

As federal agencies begin incorporating substantial big-data capabilities into their organizations, they're grappling with some of the nitty-gritty details of how to do it. A couple of pro tips: Focus on the mission, and mix and match information from different sources to find innovative ways to use data.

Shawn Kingsberry, CIO at the Recovery Accountability and Transparency Board, advised federal agencies to "ignore the buzzword soup" of technology and focus on what they want to achieve with big-data applications.

"Technology can divert attention from the business needs," he said during a big-data conference in Washington on Feb. 25. "At the end of the day, you know the problem you're trying to solve, but sometimes we can't focus on that because we're worried about the latest buzzword."

A tight focus can lead to innovative thinking that yields useful big-data solutions without spending money on new technology. For instance, Kingsberry said his agency combined the Justice Department’s fraud indictment information with audits of big recipients of federal assistance to find data that indicated possible criminal activity.

Similarly, mixing and matching big databases helped the Social Security Administration develop datasets for verifying disability claims, said Herb Strauss, SSA's deputy CIO.

He said the agency combines its deep pool of information with outside databases such as LexisNexis to match property ownership records against the information supplied by claimants.

Agencies can learn a great deal by sharing information with one another and mining external data sources, Strauss added.

He said SSA has been working in a big-data environment since its origins in the 1930s, although the technological capabilities were quite different when the agency was first tasked with assigning and maintaining accounts for every American. That responsibility eventually expanded to include tracking survivor and disability benefits and other duties that increased the amounts of data SSA monitored.

Today, applying big-data technology and techniques is an ongoing process that requires continued attention. "It can't be like a cat fight, with 10 seconds of intense activity followed by a five-year pause," Strauss said.

Although SSA has been sifting data since the Great Depression, Kingsberry's agency has been at work only since the Great Recession.

He detailed the construction of the Recovery.gov site that tracks data on spending under the American Recovery and Reinvestment Act of 2009, which led audience members to ask about how to track unstructured data. Such information can present problems because some systems cannot process it uniformly. That sparked further discussion about the difficulties of sharing data, unstructured or otherwise, across agencies.

Kingsberry said his agency took responsibility for data input from the beginning to ensure that it was presented in a uniform manner.

He added that when he works with other agencies to access their data, he sets up memoranda of understanding that explicitly state what each agency expects and what their responsibilities are.