Big Data

How agencies can put Hadoop to work

DNA strand

Genome research is one of several areas where the big-data tool Hadoop is proving itself. (Stock image)

Can the big-data tool Hadoop help rescue victims during the next big disaster, or steer health officials toward the cancer treatment that is a patient's best bet?

It’s beginning to happen, according to Dante Ricci, director of federal innovation at SAP. Pre-packaged solutions based on Hadoop already exist that federal agencies (and state and local ones) could use to deliver critical insights from to researchers and emergency responders in real-time, he said.

"Hadoop is a powerful technology that can meet some of the needs of big data, but not all the different use cases of big data," Ricci said. "Government employees and citizens want a simple interface, but they don’t have that combination of tools that allow end-users to search and understand the different capabilities in the data that are there for finding correlations."

Explaining Hadoop

For those still trying to wrap their head around big data, Hadoop is an even bigger enigma. Here's the (very) short version:

Hadoop elephant

The Hadoop software framework allows the ability to work with massive amounts of unstructured data by spreading the load across a large number of servers. It is an open-source version of Google's MapReduce, which the search-engine giant developed for its own web-indexing and searching efforts.

Hadoop is an Apache Software Foundation project. A more detailed description is available at http://hadoop.apache.org/

Hadoop is as an open-source framework used in managing and processing vast amounts of structured and unstructured data. (see sidebar) Companies like Facebook, Twitter and Yahoo use it to take enormous volumes of low-value information from web servers, such as link clicks, and turn into useful data, according to Sid Probstein, CTO of Attivio, a Massachusetts-based enterprise software company.

But useful summaries require Hadoop be layered with other applications that might, for example, mine data or provide visualization to an end-user.

"Hadoop is a solid technology, it’s very effective at solving the volume problem, but most interesting output has to be merged with other data," Probstein said. "You have to put that data in order for real people to consume it."

Some companies have turned to pre-packaged solutions that do just that, combining Hadoop’s data processing capabilities with other tools like in-memory technology to provide end-users with real-time insights gleaned from big piles of data.

Ricci cited a use-case of MKI, a biotechnology company based in Japan, as evidence of where Hadoop-based technologies are headed.

MKI uses Hadoop in conjunction with SAP’s in-memory HANA system and the open-source statistical program R to create what Ricci called a "real-time big data platform" that cuts down the time it takes doctors to sequence genomes.

In the system, Hadoop handles data pre-processing and high-speed storage, R does the data mining and SAP’s HANA system performs real-time analysis of patient data for MKI.

The end result is that doctors can compare the genome data of a cancer patient with healthy individuals, delivering analysis "before the patient leaves the hospital," Ricci said.

The National Institutes of Health could use such a system in its cancer research efforts, Ricci said. And because the technology stack can group together real-time independent data feeds like 911 and search and rescue and display information visually, it could prove useful in agencies responding to natural disasters or major emergencies.

"Hadoop with in-memory technology would allow emergency responders to have a holistic view of what’s going on," Ricci said. "How do you gather all that information up, make sense of it all so you can coordinate from a central or multiple locations and make sure it’s done efficiently and nobody’s left out? That’s what we’re trying to do."

Alone, Ricci said, Hadoop is a useful tool for some, but suggested its best bet in the evolution of big data is in concert with other tools to "bring information together in a better user-interface."

"Hadoop does not fit all – it has limitations – and it’s not easy for a business person or someone to go in quickly and garner insight without help from technologists," Ricci said. "The evolution of big data comes down to making the information available to all stakeholders."

About the Author

Frank Konkel is a former staff writer for FCW.

Featured

  • Contracting
    8 prototypes of the border walls as tweeted by CBP San Diego

    DHS contractors face protests – on the streets

    Tech companies are facing protests internally from workers and externally from activists about doing for government amid controversial policies like "zero tolerance" for illegal immigration.

  • Workforce
    By Mark Van Scyoc Royalty-free stock photo ID: 285175268

    At OPM, Weichert pushes direct hire, pay agent changes

    Margaret Weichert, now acting director of the Office of Personnel Management, is clearing agencies to make direct hires in IT, cyber and other tech fields and is changing pay for specialized occupations.

  • Cloud
    Shutterstock ID ID: 222190471 By wk1003mike

    IBM protests JEDI cloud deal

    As the deadline to submit bids on the Pentagon's $10 billion, 10-year warfighter cloud deal draws near, IBM announced a legal protest.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.