Big Data

How agencies can put Hadoop to work

DNA strand

Genome research is one of several areas where the big-data tool Hadoop is proving itself. (Stock image)

Can the big-data tool Hadoop help rescue victims during the next big disaster, or steer health officials toward the cancer treatment that is a patient's best bet?

It’s beginning to happen, according to Dante Ricci, director of federal innovation at SAP. Pre-packaged solutions based on Hadoop already exist that federal agencies (and state and local ones) could use to deliver critical insights from to researchers and emergency responders in real-time, he said.

"Hadoop is a powerful technology that can meet some of the needs of big data, but not all the different use cases of big data," Ricci said. "Government employees and citizens want a simple interface, but they don’t have that combination of tools that allow end-users to search and understand the different capabilities in the data that are there for finding correlations."

Explaining Hadoop

For those still trying to wrap their head around big data, Hadoop is an even bigger enigma. Here's the (very) short version:

Hadoop elephant

The Hadoop software framework allows the ability to work with massive amounts of unstructured data by spreading the load across a large number of servers. It is an open-source version of Google's MapReduce, which the search-engine giant developed for its own web-indexing and searching efforts.

Hadoop is an Apache Software Foundation project. A more detailed description is available at http://hadoop.apache.org/

Hadoop is as an open-source framework used in managing and processing vast amounts of structured and unstructured data. (see sidebar) Companies like Facebook, Twitter and Yahoo use it to take enormous volumes of low-value information from web servers, such as link clicks, and turn into useful data, according to Sid Probstein, CTO of Attivio, a Massachusetts-based enterprise software company.

But useful summaries require Hadoop be layered with other applications that might, for example, mine data or provide visualization to an end-user.

"Hadoop is a solid technology, it’s very effective at solving the volume problem, but most interesting output has to be merged with other data," Probstein said. "You have to put that data in order for real people to consume it."

Some companies have turned to pre-packaged solutions that do just that, combining Hadoop’s data processing capabilities with other tools like in-memory technology to provide end-users with real-time insights gleaned from big piles of data.

Ricci cited a use-case of MKI, a biotechnology company based in Japan, as evidence of where Hadoop-based technologies are headed.

MKI uses Hadoop in conjunction with SAP’s in-memory HANA system and the open-source statistical program R to create what Ricci called a "real-time big data platform" that cuts down the time it takes doctors to sequence genomes.

In the system, Hadoop handles data pre-processing and high-speed storage, R does the data mining and SAP’s HANA system performs real-time analysis of patient data for MKI.

The end result is that doctors can compare the genome data of a cancer patient with healthy individuals, delivering analysis "before the patient leaves the hospital," Ricci said.

The National Institutes of Health could use such a system in its cancer research efforts, Ricci said. And because the technology stack can group together real-time independent data feeds like 911 and search and rescue and display information visually, it could prove useful in agencies responding to natural disasters or major emergencies.

"Hadoop with in-memory technology would allow emergency responders to have a holistic view of what’s going on," Ricci said. "How do you gather all that information up, make sense of it all so you can coordinate from a central or multiple locations and make sure it’s done efficiently and nobody’s left out? That’s what we’re trying to do."

Alone, Ricci said, Hadoop is a useful tool for some, but suggested its best bet in the evolution of big data is in concert with other tools to "bring information together in a better user-interface."

"Hadoop does not fit all – it has limitations – and it’s not easy for a business person or someone to go in quickly and garner insight without help from technologists," Ricci said. "The evolution of big data comes down to making the information available to all stakeholders."

About the Author

Frank Konkel is a former staff writer for FCW.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.

Featured

  • FCW @ 30 GPS

    FCW @ 30

    Since 1996, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

  • Shutterstock image.

    Merged IT modernization bill punts on funding

    A House panel approved a new IT modernization bill that appears poised to pass, but key funding questions are left for appropriators.

  • General Frost

    Army wants cyber capability everywhere

    The Army's cyber director said cyber, electronic warfare and information operations must be integrated into warfighters' doctrine and training.

  • Rising Star 2013

    Meet the 2016 Rising Stars

    FCW honors 30 early-career leaders in federal IT.

Reader comments

Thu, Jan 31, 2013 Guest

Not to be too critical, but there is NOT ONE example listed here of how a government agency can / should practically apply Hadoop. I'm disappointed. To be clear, any rigorous conversation on this topic should tackle how Hadoop aligns with "Cloud First". Ready? Go!

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group