Blog archive

Defining big data

Big Data

In a comment on FCW's April 15 article, "Sketching the big picture on big data,"  a reader offered a definition of the term: An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.

Frank Konkel responds: While I do not disagree with your definition, I believe some people might add or subtract bits to it. Your definition wisely includes "easily scalable," which actually answers one question that some big data definitions seem to (conveniently?) leave out: How big the big data actually is. The phrase "easily scalable" tells the user that there really isn't a limit on size here – if it is scalable, we'll get there.

However, I'm not sure I agree that big data has to be unstructured. For example, the National Oceanic and Atmospheric Administration, an agency within the U.S. Department of Commerce, uses pools of structured data from different sources (including satellites and ground-based observatories) in its climate modeling and weather forecasting. These data troves are large – terabytes and bigger – and in some cases, like weather prediction, high-end computers spit out storm models in real-time on the order of several times per day. Is that big data? Depending on who you ask, it might be.

What about at the United States Postal Service? USPS' supercomputing facilities in Minnesota process and detect fraud on 6,100 mail pieces per second, or about 528 million each day. The time it takes to scan one piece at a post office and compare the data against a database of 400 billion objects? Less than 100 milliseconds. Is that big data? Again, it might depend on who you ask.

In addition, while I agree it's nice to pull structured datasets from unstructured data, I feel like one thing missing from most big data definitions is the "why" factor. You're structuring this data – hopefully – for a purpose: to develop actionable insights. Why else would be doing big data, right? Yet only some definitions seem to include the "value" aspect, one of the "v" words that also include volume, veracity, variety and velocity.

Teradata's Bill Franks, who recently authored a book on big data, argues that value is the single most important factor in all of big data. Is it not reasonable to think that aspect might be outlined in any big data definition?

Because big data is relatively new on the IT scene, I suspect ambiguity regarding its definition and uses for a while. But just like cloud computing, its definition, along with its practical uses, will be cemented in the years to come.

Posted by Frank Konkel on Apr 22, 2013 at 12:10 PM


  • Congress
    U.S. Capitol (Photo by M DOGAN / Shutterstock)

    Funding bill clears Congress, heads for president's desk

    The $1.3 trillion spending package passed the House of Representatives on March 22 and the Senate in the early hours of March 23. President Trump is expected to sign the bill, securing government funding for the remainder of fiscal year 2018.

  • 2018 Fed 100

    The 2018 Federal 100

    This year's Fed 100 winners show just how much committed and talented individuals can accomplish in federal IT. Read their profiles to learn more!

  • Census
    How tech can save money for 2020 census

    Trump campaign taps census question as a fund-raising tool

    A fundraising email for the Trump-Pence reelection campaign is trying to get supporters behind a controversial change to the census -- asking respondents whether or not they are U.S. citizens.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.