DARPA seeks innovative ways to make sense of data

If big data represents the search for a needle in a haystack, DARPA aims to develop technology that explains why the needle is in the stack in the first place.

Placeholder Image for Article Template

Big data is everywhere, and although the collection of huge datasets from sensors, machines and device-wielding humans has become increasingly automated, understanding causes and connections is still left to people.

The Defense Advanced Research Projects Agency wants to change that.

DARPA's Big Mechanism program will provide up to $45 million in contracts, grants and cooperative agreements for as many as 12 award winners that demonstrate "innovative approaches that enable revolutionary advances in science, devices or systems," according to a newly released solicitation.

In other words, if big data highlights the search for a needle in a haystack of data, DARPA aims to develop technology that explains why the needle is there.

"Big Mechanisms are causal, explanatory models of complicated systems in which interactions have important causal effects," the solicitation states. "The collection of Big Data is increasingly automated, but the creation of Big Mechanisms remains a human endeavor made increasingly difficult by the fragmentation and distribution of knowledge. To the extent that we can automate the construction of Big Mechanisms, we can change how science is done."

DARPA said new research will be necessary in numerous fields related to how machines and humans process massive amounts of data, including statistical and knowledge-based natural language processing, curation and ontology, systems and mathematical biology, and representation and reasoning.

As the solicitation points out, some of the systems that matter most to the Defense Department are complex, but they are studied in a fragmented, inconsistent fashion that makes it difficult to build complete, accurate models.

For its first project, DARPA plans to use the Big Mechanism program to address cancer pathways, which are molecular interactions that can cause some cells to turn cancerous.

Officials are seeking an automated way to extract useful information and causal mechanisms from abstracts and research papers about cancer biology. Computers should then assemble the fragments of data into "complete pathways of unprecedented scale and accuracy" to determine cause-and-effect relationships and how those relationships could be altered or manipulated to prevent or control the disease.

"The language of molecular biology and the cancer literature emphasize mechanisms," said Paul Cohen, a DARPA program manager, in a press release. "Papers describe how proteins affect the expression of other proteins and how these effects have biological consequences. Computers should be able to identify causes and effects in cancer biology papers more easily than in, say, the literatures of sociology or economics."

He added: "Unfortunately, what we know about big mechanisms is contained in enormous, fragmentary and sometimes contradictory literatures and databases, so no single human can understand a really complicated system in its entirety. Computers must help us."

If DARPA's effort with cancer research shows promise, the implications would be far-reaching. Any article or bit of published research about any topic of interest could become part of a computer-maintained and computer-examined causal model of a larger system -- what DARPA calls the Big Mechanism.

"Causal models are needed to predict how systems will respond to interventions -- how a patient or an economy will respond to a drug or a new tax -- and to understand why systems behave as they do," Cohen said. "By emphasizing causal models and explanation, Big Mechanism may be the future of science."