What not to do with your data

data points

The federal government has long produced data by the truckload, and the open-data initiatives of the Obama administration have put more of it in the public eye than ever before. And although many agencies have moved beyond spreadsheets and CSV files to offer dashboards, maps and other visualization tools, the vast majority of those presentations are not very good.

Nathan Yau is trying to change that.

His book, "Data Points: Visualization that Means Something," does not focus on agencies in particular, though federal data is discussed and used in dozens of sample charts and graphs. Whether it is census data or a chart comparing the cost of cable television vs. Netflix and other "cord-cutting" options, the challenge remains the same: how to make a visual presentation clear enough to be easily comprehensible, yet informative enough to tease out real insights. Effective visualization is hard, Yau stressed, and requires a mix of math and design skills that few individuals possess.

"Data Points" is not a technical how-to guide — though Yau has written that, too, with his 2011 book, "Visualize This." His goal this time is to walk would-be data visualizers through the process of design and analysis, from the ground rules of statistics and visual aesthetics to proven best practices for storytelling and common errors to avoid.

Want to know whether to use a pie chart or a bar chart for a particular dataset, and what signals a map's color palette sends to the audience? "Data Points" has the answers. Curious about how to explore and display the correlation between two variables? Yau plots education data from all 50 states 18 ways and shows how different visuals can uncover very different patterns in a single dataset.

With a mix of hard rules, best-practice examples, and data-visualization history that dates back to William Playfair and Florence Nightingale, Yau seeks to impart a mindset as much as a skill set. "The mark of a good graph is not only how fast you can read it," he wrote, quoting statistician William Cleveland, "but also what is shows. Does it enable you to see what you could not see before?" Kaiser Fung's new book, meanwhile, dispenses with the aesthetic visual storytelling questions entirely, instead drilling into the dangers of datasets themselves. In "Numbersense: Using Big Data to Your Advantage," Fung warns that "people in industry who wax on about Big Data take it for granted that more data begets more good.... [But] when more people are performing more analyses more quickly, there are more theories, more points of view, more complexity, more conflicts and more confusion. There is less clarity, less consensus and less confidence."

In Fung's view, the core problem is not that the creators of a dataset are trying to mislead — though there are plenty of examples of that as well, many of which he has documented over the years on his "Junk Charts" blog. Rather, he said, most consumers of data are essentially innumerate and do not understand basic statistics or the countless judgment calls that go into developing a dataset.

number sense

To fill those knowledge gaps, "Numbersense" presents eight chapter-length case studies. The consumer price index and monthly unemployment reports are placed under Fung's microscope, as are law school rankings, Groupon's economics, fantasy football stats and multiple firms' marketing efforts. Even the dieter's dreaded body mass index gets deconstructed.

So although Fung praises the Bureau of Labor Statistics for the "impressive accuracy" of its payroll survey, he shows how the definition of unemployment is at least as important as the tallying process. When does an out-of-work individual slip out of the workforce? Do you have any idea what the "seasonal adjustment" entails? And what happens when an employer simply skips that month's survey? As Fung notes, "Statisticians have a cautionary saying: Absence of evidence is not evidence of absence."

At its core, Fung's warning boils down to Mark Twain's frequent dictum that there are three kinds of lies: lies, damned lies and statistics. Yet a basic understanding of data and some healthy skepticism can go a long way, Fung promises. Know where the numbers come from and what assumptions were made in crunching them, and you'll avoid the lion's share of confusion and mischief. As Fung succinctly put it, "The key isn't how much data is analyzed, but how."

About the Author

Troy K. Schneider is editor-in-chief of FCW and GCN.

Prior to joining 1105 Media in 2012, Schneider was the New America Foundation’s Director of Media & Technology, and before that was Managing Director for Electronic Publishing at the Atlantic Media Company. The founding editor of, Schneider also helped launch the political site in the mid-1990s, and worked on the earliest online efforts of the Los Angeles Times and Newsday. He began his career in print journalism, and has written for a wide range of publications, including The New York Times,, Slate, Politico, National Journal, Governing, and many of the other titles listed above.

Schneider is a graduate of Indiana University, where his emphases were journalism, business and religious studies.

Click here for previous articles by Schneider, or connect with him on Twitter: @troyschneider.

The Fed 100

Read the profiles of all this year's winners.


  • Shutterstock image (by wk1003mike): cloud system fracture.

    Does the IRS have a cloud strategy?

    Congress and watchdog agencies have dinged the IRS for lacking an enterprise cloud strategy seven years after it became the official policy of the U.S. government.

  • Shutterstock image: illuminated connections between devices.

    Who won what in EIS

    The General Services Administration posted detailed data on how the $50 billion Enterprise Infrastructure Solutions contract might be divvied up.

  • Wikimedia Image: U.S. Cyber Command logo.

    Trump elevates CyberCom to combatant command status

    The White House announced a long-planned move to elevate Cyber Command to the status of a full combatant command.

  • Photo credit: John Roman Images /

    Verizon plans FirstNet rival

    Verizon says it will carve a dedicated network out of its extensive national 4G LTE network for first responders, in competition with FirstNet.

  • AI concept art

    Can AI tools replace feds?

    The Heritage Foundation is recommending that hundreds of thousands of federal jobs be replaced by automation as part of a larger government reorganization strategy.

  • DOD Common Access Cards

    DOD pushes toward CAC replacement

    Defense officials hope the Common Access Card's days are numbered as they continue to test new identity management solutions.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group