TheConversation

Blog archive

Defining big data

Big Data

In a comment on FCW's April 15 article, "Sketching the big picture on big data,"  a reader offered a definition of the term: An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.

Frank Konkel responds: While I do not disagree with your definition, I believe some people might add or subtract bits to it. Your definition wisely includes "easily scalable," which actually answers one question that some big data definitions seem to (conveniently?) leave out: How big the big data actually is. The phrase "easily scalable" tells the user that there really isn't a limit on size here – if it is scalable, we'll get there.

However, I'm not sure I agree that big data has to be unstructured. For example, the National Oceanic and Atmospheric Administration, an agency within the U.S. Department of Commerce, uses pools of structured data from different sources (including satellites and ground-based observatories) in its climate modeling and weather forecasting. These data troves are large – terabytes and bigger – and in some cases, like weather prediction, high-end computers spit out storm models in real-time on the order of several times per day. Is that big data? Depending on who you ask, it might be.

What about at the United States Postal Service? USPS' supercomputing facilities in Minnesota process and detect fraud on 6,100 mail pieces per second, or about 528 million each day. The time it takes to scan one piece at a post office and compare the data against a database of 400 billion objects? Less than 100 milliseconds. Is that big data? Again, it might depend on who you ask.

In addition, while I agree it's nice to pull structured datasets from unstructured data, I feel like one thing missing from most big data definitions is the "why" factor. You're structuring this data – hopefully – for a purpose: to develop actionable insights. Why else would be doing big data, right? Yet only some definitions seem to include the "value" aspect, one of the "v" words that also include volume, veracity, variety and velocity.

Teradata's Bill Franks, who recently authored a book on big data, argues that value is the single most important factor in all of big data. Is it not reasonable to think that aspect might be outlined in any big data definition?

Because big data is relatively new on the IT scene, I suspect ambiguity regarding its definition and uses for a while. But just like cloud computing, its definition, along with its practical uses, will be cemented in the years to come.

Posted by Frank Konkel on Apr 22, 2013 at 12:10 PM


FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.

Featured

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

  • Shutterstock image.

    Merged IT modernization bill punts on funding

    A House panel approved a new IT modernization bill that appears poised to pass, but key funding questions are left for appropriators.

  • General Frost

    Army wants cyber capability everywhere

    The Army's cyber director said cyber, electronic warfare and information operations must be integrated into warfighters' doctrine and training.

  • Rising Star 2013

    Meet the 2016 Rising Stars

    FCW honors 30 early-career leaders in federal IT.

Reader comments

Tue, Apr 23, 2013 John Schutz denver

That was my description from the 15th. Another description could be: "A hadoop-based system"...which is even more to the point and true. A large RDBMS is not big data, no matter the size, as big data is contained in an hadoop system, afaict.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group