What it takes to review 650,000 emails

E-mail circling the globe 

On Oct. 28, FBI Director James Comey told Congress that bureau investigators had found and would analyze additional emails that may have relevance to the investigation into Hillary Clinton's private email server.

By Nov. 6 the FBI had shared with Congress its conclusion that the newly discovered emails did not affect the agency's original conclusions that Clinton committed no criminal wrongdoing.

"I am very grateful to the professionals at the FBI for doing an extraordinary amount of high-quality work in a short period of time," Comey wrote.

But for critics of Comey's findings, the apparently compressed time frame of this second investigation was suspect.

At a campaign stop in Michigan later that day, presidential candidate Donald Trump said the FBI's expedited analysis simply isn't possible: "You can't review 650,000 new emails in eight days," he said.  "You can't do it, folks."

Retired General Michael Flynn, formerly head of the Defense Intelligence Agency and a prominent Trump supporter tweeted, "It took 1 year to review 60K and 8 days to review 650K? Smart machines or not, something does not jive. Thoughts?"

But it is possible, and researchers, lawyers and cyber-forensic experts have been doing it for years. Just look at the Enron case, said Ben Shneiderman, a computer science professor at University of Maryland. There were more than a million emails released to the public during the Enron investigation in 2003. Since then, that database of emails has been used by researches to study how people use email, he said.

"The capacity for people to explore and visualize these kinds of datasets is a great success story of the research field," he said.

So when the FBI was asked to look into these emails, it wasn't being asked to do anything revolutionary. It's a fairly standard cyber-forensic skill, according to Mark Lanterman, the CTO of Computer Forensic Services and former senior computer forensic analyst for the U.S. Secret Service Electronic Crimes Task Force.

They first myth that needs to be dispelled, Lanterman said, is that the software used by the FBI is "special."

"I saw in the media when the story first broke -- a number of references to 'special software' that the FBI is using to do this," he said. "They just use commercially available software like just about anyone else."

Based on his  prior experience with federal law enforcement, Lanterman said the FBI would have likely used Encase, Forensic Toolkit or DTSearch software to help analyze the email data.

In any forensic case, the investigators first create a perfect copy, or forensic image, of the hard drive. The software then makes a searchable index of everything on the hard drive, he said.

Using keywords and timelines the software can filter the dataset based on a number of criteria. The software will show duplicates, which it finds using an identifier known as a hash.

Depending on the state of the email, it's not surprising that it took eight days, Lanterman said. It's common for these cases to take longer, and he said he was surprised it didn't take at least a week more.

Shneiderman also noted that the FBI was likely not just working on searching databases and de-duplicating data.

"Looking through it is one thing," he said. "The political decision of deciding what's important and the legal decision of deciding what is potentially a violation of law, potentially took longer." said.

The FBI declined to comment for this article, instead referring to Comey's letter.

A version of this article originally appeared in GCN, a sibling publication of FCW.

About the Author

Matt Leonard is a reporter/producer at GCN.

Before joining GCN, Leonard worked as a local reporter for The Smithfield Times in southeastern Virginia. In his time there he wrote about town council meetings, local crime and what to do if a beaver dam floods your back yard. Over the last few years, he has spent time at The Commonwealth Times, The Denver Post and WTVR-CBS 6. He is a graduate of Virginia Commonwealth University, where he received the faculty award for print and online journalism.

Leonard can be contacted at or follow him on Twitter @Matt_Lnrd.

Click here for previous articles by Leonard.


  • Contracting
    8 prototypes of the border walls as tweeted by CBP San Diego

    DHS contractors face protests – on the streets

    Tech companies are facing protests internally from workers and externally from activists about doing for government amid controversial policies like "zero tolerance" for illegal immigration.

  • Workforce
    By Mark Van Scyoc Royalty-free stock photo ID: 285175268

    At OPM, Weichert pushes direct hire, pay agent changes

    Margaret Weichert, now acting director of the Office of Personnel Management, is clearing agencies to make direct hires in IT, cyber and other tech fields and is changing pay for specialized occupations.

  • Cloud
    Shutterstock ID ID: 222190471 By wk1003mike

    IBM protests JEDI cloud deal

    As the deadline to submit bids on the Pentagon's $10 billion, 10-year warfighter cloud deal draws near, IBM announced a legal protest.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.