When government websites fail

The FCC’s site went down last year not because of a DDoS attack, but because it couldn’t handle the traffic spike. So who’s responsible for making sure that doesn’t happen?

 

Between May 7 and May 8, 2017, the Federal Communications Commission’s comment system buckled under a deluge of comments on its proposal to undo the net neutrality rules.

At the time, agency tech officials assured FCC leadership that the crash of the commenting system was caused by a cyberattack -- specifically a distributed denial of service (DDoS) attack caused by bots overrunning the application programming interface that allows for bulk submission to the agency's commenting system.

But a June 20 FCC inspector general report released publicly in August found no evidence to support those claims and made the case that the failure of the FCC's commenting system was due to the volume of incoming comments sparked by a segment on the HBO program hosted by John Oliver.

FCC Chairman Ajit Pai placed the blame on tech officials, saying he was "surprised and disappointed" that the OIG findings clashed with the record offered by former CIO David Bray and his senior staff.

Congress is looking into the matter. An Aug. 16 Senate Commerce Committee hearing is expected to touch on the controversy. Four Democrats on the House Energy and Commerce Committee sent a letter to Pai on Aug. 14 demanding answers on when he learned that the claims of a DDoS attack were inaccurate.

But beyond the political contretemps and finger pointing is the question of what happens when a government system can’t do what it was built for.

Daniel Castro, vice president at the Information Technology and Innovation Foundation and its Center for Data Innovation director, said system failures are common in the public and private sectors and are often a failure of design and lack of upkeep.

"You have these large spikes in traffic," he said, referring to the IRS system failure crash on tax day this year. "They know going into it that there is going to be a period of time where everyone is going to file their taxes."

And as predictable as the behavior was, the system couldn't handle it.

"It’s not that they haven’t tried to update their sites," Castro said, "Trying to adjust for large volumes is difficult. You have multiple potential points of failure -- overloaded servers, browsers -- and have to create enough resiliency in them."

Those challenges are exacerbated for government agencies, many of which have systems that weren’t designed or tested for significant traffic surges.

The federal government has more than 4,500 websites and 400 domains, according to a November 2017 ITIF report. And 91 percent failed to perform in either mobile-friendliness, speed, security or accessibility.

While the FCC’s site was down, commenters had alternatives, such as paper filing, but the truth is the commenting function wasn’t initially built to handle several million filings -- presenting a perfect opportunity for commercial cloud use, Castro suggested.

“When the system was designed, volume wasn’t considered,” Castro pointed out, adding that the FCC is a smaller agency that routinely churns out rulings that often go unnoticed by the general public and late-night TV comedians.

People v. Technology

The FCC isn’t alone in scaling challenges. In addition to the agency’s 2014 system failure for the same policy issue and the IRS crash, the Securities and Exchange Commission’s site crumbled in 2010 due to a flash software failure, and the NASDAQ's public filing system snagged Facebook’s initial public offering in 2012.

But for Rebecca Piazza, vice president of program delivery at Nava and former executive director for 18F, the root of system failures goes back to government’s investment -- or lack thereof -- in the workforce.

"If systems aren't able to perform the functions needed, it undermines the public’s trust in government and in democracy," Piazza said. "It's bigger than the technology itself."

Piazza said system failures "present as technology issues" but can be traced back to personnel.

"We really need to be looking at whether we're giving the people in government the tools they need to build and buy the right systems," she said. "I don’t see a single source of these failures. It’s everything from procurement, to the way budgets are allocated, team structures."

So who’s accountable?

Castro said that while the government is ultimately responsible for system function and performance, the IG report missed an opportunity to scrutinize contractors.

“The IG report is unfortunate because it doesn’t give the details of the contractors used, where performance is tied to particular contractors, to have accountability,” he said. "Poor performance because of design specifications is different than poor performance because of delivery."

But accountability for smooth-running government IT must also sit with Congress and agency heads, Castro said, simply because “bad government websites are unacceptable.”

 “Every site should have a lifecycle,” Castro said, with initial authority to operate by CIO or agency head for a three-year maximum, after which it is taken down or renewed and updated for content, security, mobile-friendliness and usability.

But eventually, government IT has to maintained and sustained in a way that allows for quick, creative solutions to prevent failure.

Government agencies often can’t afford to "fail fast" because their systems serve millions, Piazza said, but incentivizing risk so employees can buy and build scalable and resilient technology could be a solution.

"I think there’s a lot of fear" surrounding trying different approaches to procurement, so "if something that goes wrong, there are the consequences of failure," she said. "Start small, a subcomponent of a larger system where you can demonstrate success but doesn’t put the core mission at risk."