AI success in federal health agencies starts with effective data management

health data (Supphachai Salaeman/ 

It is hard to overstate just how much and how fast artificial intelligence is embedding itself in the work of federal health and health care agencies. We are currently seeing this on impressive display with the global response to the coronavirus (COVID-19) pandemic. In just the few months since the virus erupted onto the world stage, governments and organizations have employed AI to track outbreaks and predict where future outbreaks will occur; research and develop treatments; diagnose the disease from CT lung scans; model the protein structures of the SARS-CoV-2 virus that causes COVID-19, which may reveal clues for a vaccine; and better understand the origins and potential future variations of the virus.

Likewise, a new report by the National Academy of Medicine, entitled Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril, explains how AI is revolutionizing health care, noting its potential to "address known challenges in health care delivery and achieve the vision of a continuously learning health system, accounting for personalized needs and preferences."

The report explains how new AI-enabled capabilities like natural language processing and chatbots are being used for disease prevention and management (e.g., Clara, CDC's recently launched "coronavirus self-checker" bot), how machine learning is being used to support medical diagnoses, or how deep learning is being used to identify at-risk populations in public health. The Centers for Disease Control and Prevention, the Defense Health Agency, the National Institutes of Health, the Centers for Medicare & Medicaid Services, the Departments of Veterans Affairs and Health and Human Services, and other agencies are researching, testing, developing and deploying a wide variety of AI tools to advance their missions. In fact, HHS is set to announce a new data platform called Protect Now, which will leverage AI to gather data from government and public sources to help officials track and respond to the coronavirus.

Yet the odds of implementation success in health care organizations are not necessarily promising. Consider these estimates from Andrew White, vice president at Gartner, an industry advisory firm:

  • Only 15% of use cases with AI, edge and internet-of-things environments will succeed.
  • Through 2020, 80% of AI projects "will remain alchemy, run by wizards whose talents will not scale in the organization."

So how can government health care and public health organizations overcome these odds and realize AI's vast potential to improve productivity, patient care, and public health and safety?

The foundation of any successful, enterprisewide AI deployment is effective data management. First, organizations must obtain a sufficient quantity and quality of data and be able to effectively classify, secure and manage it in order to be able to "feed" it to "data-hungry" AI algorithms. When possible, the data management approach should ensure that both structured and unstructured data from diverse sources can be used. This can include detailed genomic data, clinical data from electronic health records, claims data from payers, images (e.g., x-ray images), and many other sources. Health data relied on for analysis also needs to be well documented in terms of its composition, collection, preprocessing, distribution and maintenance processes.

It is also critically important to identify the broad array of data from public sources, published papers, visualizations, maps and APIs, especially during the rapidly unfolding pace of analysis in the coronavirus outbreak. To that end, Booz Allen created a data resource list, called covid resources, using GitHub. This resource provides information and links to a wide range of vetted sources with well documented data that can be accessed by anyone at any time.

Federal health agencies can learn important AI data management lessons from the longer standing experiences of the defense and intelligence sectors. Like those sectors, federal health agencies could gradually improve their efforts by:

  • Automating the access and collection of diverse data (including unstructured data) in a central repository for access by AI data modelers.
  • Expanding collaboration and data sharing by securing data at the most granular level. This can better enable appropriate access by individuals based on the project-criteria of their role. Such an approach can help balance the often competing demands of data protection and information sharing.
  • Deploying natural language processing to be able to systematically analyze "free text," which usually represents 80% of data captured during clinician-patient encounters.
  • Investigating multimodal AI methods using unstructured data such as text, signals, and images, which are used to augment and inform the meaning contained in structured data.
  • Developing a sound data-tracking strategy for data creators and users to document the lifeline of a data source's pedigree for reproducibility and transparency.

Effective data management is the foundation of successful, enterprisewide AI deployments. There are new and innovative approaches, such as these, to address the many facets of data management that will help federal and defense health care agencies improve their odds of success on their AI journeys.

About the Authors

Joachim Roski, Ph.D MPH, is a principal at Booz Allen Hamilton and contributor to the National Academy of Medicine report, Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril.

Catherine Ordun, MBA, MPH is a Senior Data Scientist at Booz Allen Hamilton. She’s a co-creator of Booz Allen’s data resource list, called covid resources, using GitHub.


  • Workforce
    White House rainbow light shutterstock ID : 1130423963 By zhephotography

    White House rolls out DEIA strategy

    On Tuesday, the Biden administration issued agencies a roadmap to guide their efforts to develop strategic plans for diversity, equity, inclusion and accessibility (DEIA), as required under a as required under a June executive order.

  • Defense
    software (whiteMocca/

    Why DOD is so bad at buying software

    The Defense Department wants to acquire emerging technology faster and more efficiently. But will its latest attempts to streamline its processes be enough?

Stay Connected