What it takes to review 650,000 emails

E-mail circling the globe 

On Oct. 28, FBI Director James Comey told Congress that bureau investigators had found and would analyze additional emails that may have relevance to the investigation into Hillary Clinton's private email server.

By Nov. 6 the FBI had shared with Congress its conclusion that the newly discovered emails did not affect the agency's original conclusions that Clinton committed no criminal wrongdoing.

"I am very grateful to the professionals at the FBI for doing an extraordinary amount of high-quality work in a short period of time," Comey wrote.

But for critics of Comey's findings, the apparently compressed time frame of this second investigation was suspect.

At a campaign stop in Michigan later that day, presidential candidate Donald Trump said the FBI's expedited analysis simply isn't possible: "You can't review 650,000 new emails in eight days," he said.  "You can't do it, folks."

Retired General Michael Flynn, formerly head of the Defense Intelligence Agency and a prominent Trump supporter tweeted, "It took 1 year to review 60K and 8 days to review 650K? Smart machines or not, something does not jive. Thoughts?"

But it is possible, and researchers, lawyers and cyber-forensic experts have been doing it for years. Just look at the Enron case, said Ben Shneiderman, a computer science professor at University of Maryland. There were more than a million emails released to the public during the Enron investigation in 2003. Since then, that database of emails has been used by researches to study how people use email, he said.

"The capacity for people to explore and visualize these kinds of datasets is a great success story of the research field," he said.

So when the FBI was asked to look into these emails, it wasn't being asked to do anything revolutionary. It's a fairly standard cyber-forensic skill, according to Mark Lanterman, the CTO of Computer Forensic Services and former senior computer forensic analyst for the U.S. Secret Service Electronic Crimes Task Force.

They first myth that needs to be dispelled, Lanterman said, is that the software used by the FBI is "special."

"I saw in the media when the story first broke -- a number of references to 'special software' that the FBI is using to do this," he said. "They just use commercially available software like just about anyone else."

Based on his  prior experience with federal law enforcement, Lanterman said the FBI would have likely used Encase, Forensic Toolkit or DTSearch software to help analyze the email data.

In any forensic case, the investigators first create a perfect copy, or forensic image, of the hard drive. The software then makes a searchable index of everything on the hard drive, he said.

Using keywords and timelines the software can filter the dataset based on a number of criteria. The software will show duplicates, which it finds using an identifier known as a hash.

Depending on the state of the email, it's not surprising that it took eight days, Lanterman said. It's common for these cases to take longer, and he said he was surprised it didn't take at least a week more.

Shneiderman also noted that the FBI was likely not just working on searching databases and de-duplicating data.

"Looking through it is one thing," he said. "The political decision of deciding what's important and the legal decision of deciding what is potentially a violation of law, potentially took longer." said.

The FBI declined to comment for this article, instead referring to Comey's letter.

A version of this article originally appeared in GCN, a sibling publication of FCW.

About the Author

Matt Leonard is a reporter/producer at GCN.

Before joining GCN, Leonard worked as a local reporter for The Smithfield Times in southeastern Virginia. In his time there he wrote about town council meetings, local crime and what to do if a beaver dam floods your back yard. Over the last few years, he has spent time at The Commonwealth Times, The Denver Post and WTVR-CBS 6. He is a graduate of Virginia Commonwealth University, where he received the faculty award for print and online journalism.

Leonard can be contacted at or follow him on Twitter @Matt_Lnrd.

Click here for previous articles by Leonard.


    sensor network (agsandrew/

    Are agencies really ready for EIS?

    The telecom contract has the potential to reinvent IT infrastructure, but finding the bandwidth to take full advantage could prove difficult.

  • People
    Dave Powner, GAO

    Dave Powner audits the state of federal IT

    The GAO director of information technology issues is leaving government after 16 years. On his way out the door, Dave Powner details how far govtech has come in the past two decades and flags the most critical issues he sees facing federal IT leaders.

  • FCW Illustration.  Original Images: Shutterstock, Airbnb

    Should federal contracting be more like Airbnb?

    Steve Kelman believes a lighter touch and a bit more trust could transform today's compliance culture.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.