TheConversation

Blog archive

Defining big data

Big Data

In a comment on FCW's April 15 article, "Sketching the big picture on big data,"  a reader offered a definition of the term: An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.

Frank Konkel responds: While I do not disagree with your definition, I believe some people might add or subtract bits to it. Your definition wisely includes "easily scalable," which actually answers one question that some big data definitions seem to (conveniently?) leave out: How big the big data actually is. The phrase "easily scalable" tells the user that there really isn't a limit on size here – if it is scalable, we'll get there.

However, I'm not sure I agree that big data has to be unstructured. For example, the National Oceanic and Atmospheric Administration, an agency within the U.S. Department of Commerce, uses pools of structured data from different sources (including satellites and ground-based observatories) in its climate modeling and weather forecasting. These data troves are large – terabytes and bigger – and in some cases, like weather prediction, high-end computers spit out storm models in real-time on the order of several times per day. Is that big data? Depending on who you ask, it might be.

What about at the United States Postal Service? USPS' supercomputing facilities in Minnesota process and detect fraud on 6,100 mail pieces per second, or about 528 million each day. The time it takes to scan one piece at a post office and compare the data against a database of 400 billion objects? Less than 100 milliseconds. Is that big data? Again, it might depend on who you ask.

In addition, while I agree it's nice to pull structured datasets from unstructured data, I feel like one thing missing from most big data definitions is the "why" factor. You're structuring this data – hopefully – for a purpose: to develop actionable insights. Why else would be doing big data, right? Yet only some definitions seem to include the "value" aspect, one of the "v" words that also include volume, veracity, variety and velocity.

Teradata's Bill Franks, who recently authored a book on big data, argues that value is the single most important factor in all of big data. Is it not reasonable to think that aspect might be outlined in any big data definition?

Because big data is relatively new on the IT scene, I suspect ambiguity regarding its definition and uses for a while. But just like cloud computing, its definition, along with its practical uses, will be cemented in the years to come.

Posted by Frank Konkel on Apr 22, 2013 at 12:10 PM


Featured

  • Cybersecurity

    DHS floats 'collective defense' model for cybersecurity

    Homeland Security Secretary Kirstjen Nielsen wants her department to have a more direct role in defending the private sector and critical infrastructure entities from cyberthreats.

  • Defense
    Defense Secretary James Mattis testifies at an April 12 hearing of the House Armed Services Committee.

    Mattis: Cloud deal not tailored for Amazon

    On Capitol Hill, Defense Secretary Jim Mattis sought to quell "rumors" that the Pentagon's planned single-award cloud acquisition was designed with Amazon Web Services in mind.

  • Census
    shutterstock image

    2020 Census to include citizenship question

    The Department of Commerce is breaking with recent practice and restoring a question about respondent citizenship last used in 1950, despite being urged not to by former Census directors and outside experts.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.