The semantics of disinformation

By Derek B. Johnson

| August 26, 2019

DARPA thinks it can detect automated disinformation campaigns across a range of media by focusing on common machine-generated errors.

Matrix background with the green symbols By ltummy Royalty-free stock photo ID: 161746904

The modern media landscape is awash in false, misleading or tampered information designed to look real, from networks of bots amplifying false content on social media to the looming threat posed by altered video and audio.

The Defense Advanced Research Projects Agency is looking for technology to detect evidence of manipulation in text, audio, image and video, releasing a broad agency announcement Aug. 23 for a project called SemaFor.

The program, short for Semantic Forensics, is designed to focus on the small but common errors produced by automated systems that manipulate media content. One such example provided by the agency is a generative adversarial network, which uses a database of real headshot photographs to create a synthetic person with mismatched earrings.

Methods for detecting fake media that rely on purely statistical detection do exist, but are quickly becoming insufficient as manipulation technologies continue to advance and develop new ways to fool such models. Instead, DARPA believes that because current media manipulation tools rely heavily on ingesting and processing large amounts of data, they are more prone to making semantic errors that can be spotted with the right algorithm.

"These semantic failures provide an opportunity for defenders to gain an asymmetric advantage," DARPA writes. "A comprehensive suite of semantic inconsistency detectors would dramatically increase the burden on media falsifiers, requiring the creators of falsified media to get every semantic detail correct, while defenders only need to find one, or a very few, inconsistencies."

The project is split up into four technical areas: detection; attribution and characterization; explanation and integration; and evaluation and challenge curation. DARPA wants to make sure any algorithm developed from the project will outperform comparable manual processes and also be able to demonstrate how it reached its conclusions.

The agency also wants to keep a tight lid on some of the technical details of the project, saying it will treat program activities as controlled technical information (CTI). That means that even though such details are not classified, contractors would be barred from sharing or releasing it to other parties since it could "reveal sensitive or even classified capabilities and/or vulnerabilities of operating systems."

The base algorithm itself will not be categorized as CTI, as DARPA says it will "constitute advances to the state of the art" and would only potentially fall under the definition after it had been trained for a specific DOD or governmental purpose.

"A key goal of the program is to establish an open, standards-based, multisource, plug-and-play architecture that allows for interoperability and integration," the announcement states. "This goal includes the ability to easily add, remove, substitute, and modify software and hardware components in order to facilitate rapid innovation by future developers and users."

NEXT STORY: Quick Hits

CX in Action

Next Steps for CMMC

DARPA thinks it can detect automated disinformation campaigns across a range of media by focusing on common machine-generated errors.