Rising to the challenge of mapping health data

Brand Niemann offers inspiration and tips for analyzing and integrating the reams of health data available online.

Brand Niemann is senior data scientist at Semanticommunity.net and former senior enterprise architect and data scientist at the Environmental Protection Agency.

To build on my recent series of articles on data science, I decided to make HealthData.gov my latest exploration.

Recently, Sonny Bhagowalia, deputy associate administrator of the Office of Citizen Services and Innovative Technologies at the General Services Administration, wrote in a tweet that Data.gov's move to the cloud would yield more and better mashups of government data. Meanwhile, Todd Park, chief technology officer at the Health and Human Services Department, announced the new Health Indicators Warehouse and welcomed us to HealthData.gov. And Health 2.0 announced two new developer challenges: Healthy People 2020 and Go Viral to Improve Health. In addition, George Thomas, an enterprise architect at HHS, is working on Clinical Quality Linked Data on HealthData.gov to help achieve Linked Open Government Data goals.

So there are five major sites now with health data — HealthyPeople.gov, Health2Challenge.org, HealthIndicators.gov, HealthData.gov and Data.Medicare.gov — that can be integrated (i.e., mashed up). I inventoried the resources and datasets at those five sites in several spreadsheets and looked for opportunities to analyze them individually and collectively. I also entered the Healthy People 2020 and Go Viral to Improve Health challenges. Previously, I had built a health data indicators warehouse in the cloud as part of the Health Data Visualization Challenge of 2010, so this data science project was not completely new to me.

I started with Spotfire’s library of U.S. state and county boundaries because I knew I would be doing interactive maps with the spatial data at those five sites. Then I imported the spreadsheet data and created a separate tab in Spotfire for each major site as follows:

  • HealthyPeople.gov: Inventory to understand contents and apply business intelligence and analytics.
  • Health2Challenge.org: Inventory to understand contents and build on previous work.
  • HealthIndicators.gov: New interface to catalog and data to support business intelligence and analytics.
  • HealthData.gov: New data catalog to expedite discovery and download for business intelligence and analytics.
  • Data.Medicare.gov: Inventory of datasets to expedite discovery and download for hospital selection example.

The focus of the challenge was to extract the goals and objectives from the state-specific Healthy People 2010 and 2020 plans, map them, and integrate them with the databases above.

All that work is documented on the wiki page and its attachments so others can check and produce their own integrations. The Healthy People 2020 challenge was submitted March 7, and the Go Viral to Improve Health Challenge is due April 27. The latter includes work with more community-level data sources such as the Pellucid Health Care Transparency tables and the data sources in the book “Visualizing Data Patterns with Micromaps” by Dan Carr of George Mason University and Linda Williams Pickle, formerly with the National Cancer Institute. The latter also links to the recent work to build VIVO, an open-source Semantic Web application, in the cloud for the National Institutes of Health's Workshop on Value Added Services for VIVO.

I hope this article has piqued your interest in taking the challenge to analyze health databases — and makes it easier for you to get started.

About the Author

Brand Niemann is senior data scientist at Semanticommunity.net and former senior enterprise architect and data scientist at the Environmental Protection Agency.


  • Defense
    Ryan D. McCarthy being sworn in as Army Secretary Oct. 10, 2019. (Photo credit: Sgt. Dana Clarke/U.S. Army)

    Army wants to spend nearly $1B on cloud, data by 2025

    Army Secretary Ryan McCarthy said lack of funding or a potential delay in the JEDI cloud bid "strikes to the heart of our concern."

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

Stay Connected


Sign up for our newsletter.

I agree to this site's Privacy Policy.