Bookshelf

Nate Silver on big data's future: It's about attitude

signal noise book cover

"The Signal and the Noise," by Nate Silver, discusses various ways to use data in forecasting.

When statistician and predictive analytics expert Nate Silver speaks, people pay attention.

After he successfully predicted the outcome of the presidential election in 49 of 50 states in 2008 and all 50 in 2012, his star has grown almost as fast the unread messages in his e-mail inbox -- more than 99,000 in a recent image he posted to Twitter.

Media outlets crowned Silver the real winner in November, with the triumphant success of his data-based predictive model in the face of critical go-by-the-gut pundits that he had claimed all along weren't much more accurate than a coin-flip.

Silver, an East Lansing, Mich., native, made the same assertion in his latest book, "The Signal and the Noise: Why so Many Predictions Fail – But Some Don't", which quickly became a New York Times bestseller after its September 2012 publication.

In it, he describes a range of forecasters, including enigmatic poker players and the supercomputer-based big-data models used by the National Oceanic and Atmospheric Administration that recently nailed the path of Hurricane Sandy five days before it struck land.

Somewhere along the way, Silver has helped make data cool. Political junkies hailed it, news outlets lauded it and some pundits weren't too happy about it. Yet data is everywhere – IBM says 2.5 quintillion bytes of it are created every day – even if corporations and federal agencies are still in the early stages of making sense of "big data," the massive data sets too large to process via traditional methods.

Hidden in these vast data sets are insights that could help agencies solve big problems, but in an interview with FCW, Silver said the "big data" era will only be successful if the government is willing to evolve with it.

"In some cases, the government has the best data in the world, but not always the ability to use it," Silver said, speaking with FCW prior to delivering a keynote speech at the Feb. 12 Adobe Government Assembly.

"The governments that are willing to evolve with it will benefit, certainly," Silver said. "But there is no end-point to big data."

If big data is to become a standard tool by which the government operates, aging legacy IT systems will have to modernize. This is no small feat, given that 70 percent of the federal government's $79 billion IT budget in 2011 was spent on maintaining existing systems. Many federal systems still run Common Business-Oriented Language (COBOL), a programming language developed in 1959, ten years before NASA put a man on the moon.

But perhaps the biggest obstacle required before big data becomes a big staple in the government's IT arsenal is peoples' "goals and attitudes," Silver said.

Old-hat policies need to change – Silver later told feds at the conference that "bureaucracy is the enemy of imagination," to smiles and knowing glances around the room. Changing policies means changing people's beliefs about what big data is and what it might do.

Big data is not a cure-all, and it is inherently filled with noise and uncertainty, but it does have tremendous potential if people approach if the right way. "The world is not lacking for techniques, it's more about the right goals and right attitudes," Silver said.

Data-based statistical prediction has been changing the way people think since Bayes' Theorem – named after English mathematician Thomas Bayes – was published in the early 1800s. As Silver notes in his book, the expression forever linked science, probability and prediction. It basically states the world follows predictable laws and that the more data you have, the closer you are to a full picture of reality – and better predictions – which contrasted with most people's views of divinity at the time.

But since we can collect more data than ever before, shouldn't big data reduce uncertainty?

Not likely, Silver said.

Quotable

There isn't any more truth in the world than there was before the Internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space. -- Nate Silver in "The Signal and the Noise."

Generating trillions of relationships between variables does not necessarily generate any more meaningful relationships between them. If you flush your toilet in Washington, D.C. and it rains in Australia, you can use supercomputers to look for connections and you might even find some, but it doesn't mean those events are linked in any meaningful way.

"It's okay to fail," said Silver, who recently flopped himself, erroneously picking the San Francisco 49ers as the winners over the Baltimore Ravens in the 2013 Super Bowl. "It's okay to fail as long as you learn."

Big data predictions may be more apt to fail than those based on smaller data sets, but just like hitting the lottery, the payoffs can be enormous.

In November, NOAA's big-data based supercomputers took in data from polar-orbiting satellites, weather buoys and Gulfstream-IV and P-3 jets to churn out high-resolution computer models of Hurricane Sandy's path every six hours for several days before the storm landed. Residents in the storm's path had ample warning, and though more than 100 people died, without big data the net loss of life would have been far higher.

"If Hurricane Sandy happened 20 years back, it would almost certainly have been a disaster without much warning," Dr. Sundararaman Gopalakrishnan, a senior meteorologist at NOAA's Atlantic Oceanographic and Meteorological Laboratory in Miami, told FCW at the time.

Even in its infancy, big data is changing the way agencies perform. The Department of Health and Human Services is changing the way it responds to disasters and the Department of Defense's global shared service center has implemented a big-dated based business activity monitoring software tool that has so far detected $4 billion in improper payments.

Silver's rising star as a data wiz has helped show the importance of data in predicting and shaping the future to the masses. While he is likely to continue political predictions at his New York Times-affiliated FiveThirtyEight.com through the 2016 election, Silver said he plans to expand its focus to other categories.

Sports for sure, he said – Silver first made a name for himself developing a sabermetric system for monitoring Major League Baseball player performance in the early 2000s. (A sabermetric system is an analytics tool that measures the performance of baseball players' statistics.) He'd also like to "broaden the scope a bit" by delving into topics like economics, and perhaps health and education. Even entertainment is fair game -- on Feb. 22 Silver made predictions for the Oscar awards that were announced Feb. 24, and went four-for-six on his picks.

Where the federal government goes with big data is perhaps less clear, but tough choices will have to be made. There will never be a time, Silver said, when "Pressing a button solves all our problems."

"One thing data doesn't do is tell you what your goals ought to be," Silver said. "You still have to think about what to accomplish. Data is not a substitute for judgments you have to make."

The 2014 Federal 100

Get to know the 100 women and men honored this year for going above and beyond in federal IT.

Reader comments

Wed, Feb 27, 2013 Phil Simon

I am appalled that 70% of the budget goes towards maintaining legacy systems. That's far too high and leaves comparatively little room for true innovation and efficiency improvements.

Tue, Feb 26, 2013 DToad

"Many federal systems still run Common Business-Oriented Language (COBOL), a programming language developed in 1959, ten years before NASA put a man on the moon." is a meaningless and pejorative statement. COBOL has evolved since then. It's just as meaningful as "And the astronauts also ride in cars that were developed at the beginning of the last century." So what? Our cars and COBOL are better today. Tell us why COBOL is bad or is it the systems?

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above