TheConversation

Blog archive

Defining big data

Big Data

In a comment on FCW's April 15 article, "Sketching the big picture on big data,"  a reader offered a definition of the term: An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.

Frank Konkel responds: While I do not disagree with your definition, I believe some people might add or subtract bits to it. Your definition wisely includes "easily scalable," which actually answers one question that some big data definitions seem to (conveniently?) leave out: How big the big data actually is. The phrase "easily scalable" tells the user that there really isn't a limit on size here – if it is scalable, we'll get there.

However, I'm not sure I agree that big data has to be unstructured. For example, the National Oceanic and Atmospheric Administration, an agency within the U.S. Department of Commerce, uses pools of structured data from different sources (including satellites and ground-based observatories) in its climate modeling and weather forecasting. These data troves are large – terabytes and bigger – and in some cases, like weather prediction, high-end computers spit out storm models in real-time on the order of several times per day. Is that big data? Depending on who you ask, it might be.

What about at the United States Postal Service? USPS' supercomputing facilities in Minnesota process and detect fraud on 6,100 mail pieces per second, or about 528 million each day. The time it takes to scan one piece at a post office and compare the data against a database of 400 billion objects? Less than 100 milliseconds. Is that big data? Again, it might depend on who you ask.

In addition, while I agree it's nice to pull structured datasets from unstructured data, I feel like one thing missing from most big data definitions is the "why" factor. You're structuring this data – hopefully – for a purpose: to develop actionable insights. Why else would be doing big data, right? Yet only some definitions seem to include the "value" aspect, one of the "v" words that also include volume, veracity, variety and velocity.

Teradata's Bill Franks, who recently authored a book on big data, argues that value is the single most important factor in all of big data. Is it not reasonable to think that aspect might be outlined in any big data definition?

Because big data is relatively new on the IT scene, I suspect ambiguity regarding its definition and uses for a while. But just like cloud computing, its definition, along with its practical uses, will be cemented in the years to come.

Posted by Frank Konkel on Apr 22, 2013 at 12:10 PM


Featured

  • Defense
    Soldiers from the Old Guard test the second iteration of the Integrated Visual Augmentation System (IVAS) capability set during an exercise at Fort Belvoir, VA in Fall 2019. Photo by Courtney Bacon

    IVAS and the future of defense acquisition

    The Army’s Integrated Visual Augmentation System has been in the works for years, but the potentially multibillion deal could mark a paradigm shift in how the Defense Department buys and leverages technology.

  • Cybersecurity
    Deputy Secretary of Homeland Security Alejandro Mayorkas  (U.S. Coast Guard photo by Petty Officer 3rd Class Lora Ratliff)

    Mayorkas announces cyber 'sprints' on ransomware, ICS, workforce

    The Homeland Security secretary announced a series of focused efforts to address issues around ransomware, critical infrastructure and the agency's workforce that will all be launched in the coming weeks.

Stay Connected