Big data today will be the norm tomorrow

Author Bill Franks says organizations that capitalize on big data in the technology's infancy will seize an early competitive advantage.

bookshelf

Big data seems like a complex, complicated and potentially expensive investment that leaves many executives pondering whether they should jump aboard this latest and greatest analytics boom or wait for the next big thing in IT.

Bill Franks, author of "Taming the Big Data Tidal Wave" and chief analytics officer at Teradata, believes that public- and private-sector organizations that capitalize on big data in the technology's infancy will seize an early competitive advantage. Those that wait for answers to early questions about big data -- and there are many -- will almost certainly lag behind early adopters in formulating intelligent IT initiatives.

For businesses, that means decreased profits. For federal agencies, it means continued inefficiencies and missed opportunities to benefit taxpayers.

Franks' focus is on helping the reader understand rapidly evolving technologies and how organizations can use big data to frame decisions and create an organizational culture around analytics.

What big is and why it is changing

Franks is less concerned about a proper definition than in conveying the idea that big data will change.

Today it is typically considered to be data whose size or complexity is beyond what traditional database software tools can capture, store and analyze. But despite the "big" in big data, size isn't the only challenge. Structure -- or how the data is mashed together -- is also important to analysts hoping to generate insights from data troves.

"Big data is often described as unstructured and traditional data as structured," Franks wrote, "but the lines aren't as clean as such labels suggest."

An analyst has no control over truly unstructured datasets. Text, video and audio files fall into that context. There is no flow or format to datasets such as the collection of social media feeds for sentiment analysis. People speak and communicate differently, making analysis of such bulk information challenging.

Semi-structured data has a flow and format but lacks user-friendliness. Web logs look messy and complex, but each piece of information does "serve a purpose of some sort," Franks wrote.

Structured data falls into the realm of big data based on sheer size. Taking terabytes or petabytes of structured information, such as the transaction records of a bank, can be quite challenging for traditional relational databases, even though information is neatly organized in columns and rows.

Combining unstructured data with some variant of traditional data has led to successful results for early leaders in big data, especially retailers. Combining web-search data and customers' online habits with prior transactions can give a company a competitive advantage.

But size and complexity shouldn't drive potential big-data users away. Franks said the point is not collecting all data in a database, it's identifying which pieces matter, standardizing processes and letting the rest of the data slip by.

"The fact is that most big data just doesn't matter," wrote Franks, who acknowledges that "throwing data away will take some getting used to."

His other point is that big data won't be a "wild west of crazy formats, unconstrained streams and lack of definition" forever. As big-data technology continues to evolve, "what's big data today won't be considered big data tomorrow any more than what was considered big a decade ago is considered big today."

Framing the questions

Franks' book purposely does not delve deeply into the technical specifications of big data, but it does outline a number of the evolving technologies that make big data possible and hint at future capabilities.

Franks describes the convergence of analytic and data storage environments, massively parallel processing architecture, cloud and grid computing, and MapReduce. The book defines those technologies, but more important, it gives decision-based context to questions that IT executives pondering a big-data initiative will ask.

For example, does it make sense to run a certain big-data application in a public or private cloud -- or in a different architecture altogether? What kinds of problems are best suited to a big-data solution, and what other enabling technologies (such as MapReduce) will be required? Is big data a one-time investment or will it require further investment over time? Whom do I hire to get this done?

Franks' approach to answering those and other questions provides enough information for a reader to grasp the technologies, but his focus continually returns to helping the reader understand effective analysis of big data, how to frame decisions about its potential and possibilities, and how to build an analytic culture within an organization.