Rainy-day lessons about resiliency
The Internal Revenue Service revamped its continuity-of-operations plans following a damaging flood that shut down its headquarters building in Washington, D.C., for more than five months
Treasury Inspector General For Tax Administration’s report
When his beeper went off before dawn June 26 last year, Brian Downs began to realize Washington, D.C., wasn’t experiencing just another summer downpour. Later that day, as floodwaters swamped the Internal Revenue Service’s headquarters at 1111 Constitution Ave., Downs, an IRS operations section chief, met on-site with colleagues.
“At that point I knew we were in trouble,” Downs recalled. “I started to put my disaster recovery plan into place, which was to get as much equipment out of the building as quickly as possible.”
Before the rains stopped, water rose more than 20 feet in the subbasement, submerging electrical and maintenance equipment. For safety, emergency crews cut electricity to the building.
With the help of employees from Apogen Technologies, a network services contractor, Downs and his staff scrambled to keep the IRS’ Office of Chief Counsel’s operations intact. Officials in other departments throughout the headquarters initiated similar continuity-of-operations (COOP) moves to protect the equipment and data associated with 2,200 employees, including the IRS commissioner.
“This was a classic case of the head taking the hit,” said Ira Hobbs, who retired as chief information officer at the Treasury Department, the IRS’ parent agency, at the end of 2006. “How does the body continue to function with the dysfunctionality that occurred to the head?”
The answer came in a set of actions — some planned as part of a formal COOP strategy, some improvised on the spot to cope with unexpected challenges. In the end, the flood cost the IRS and the General Services Administration a combined $54 million for cleanup, building repairs, equipment replacement and temporary office rent, and the headquarters remained shuttered for more than five months.
An audit by the Treasury Inspector General for Tax Administration (TIGTA) released earlier this year gave the IRS a generally positive review for its response, saying that “due to preparatory and responsive actions, the IRS adequately protected sensitive data.”
But the emergency also highlighted some breakdowns that led to adjustments in the IRS’ COOP planning and provided reminders for other agencies that COOP plans must address multiple challenges: How to relaunch critical operations quickly and sustain them in COOP mode for weeks or months.Tracking assets
Before the flood, the IRS had a comprehensive COOP plan in place that it practiced regularly each year. One scenario included a headquarters lockdown following a dirty-bomb attack.
“But had we planned for a flood of the nature that happened in Washington in June? No, we hadn’t,” said John Dalrymple, deputy commissioner of operations support at the IRS at the time of the flood. He retired shortly before the headquarters building reopened in December and is now a director in Deloitte and Touche USA’s federal practice. IRS CIO Richard Spires declined comment for this story, citing scheduling conflicts.
Fortunately, the agency’s practice of housing critical data in a facility in Martinsburg, W.Va., meant the long headquarters shutdown didn’t severely affect work at numerous IRS field offices. “We didn’t have to worry about corporate data, but we had to make arrangements for our core leadership team,” Dalrymple said.
The existing contingency plan called for moving the team to offices in the U.S. Mint building, which became command central. “Then we had to decide how we were going to bring all the employees back online as quickly as possible,” Dalrymple said.
Many employees were working again within days at temporary offices and some home offices. All staff members had access to IRS computers by late July. But not everything went smoothly, the audit states. For example, in the immediate aftermath of the storm, 104 computers left headquarters before a formal asset-tracking system went live five days after the flood struck, according to the audit. Computer-tracking procedures weren’t originally part of the IRS COOP plan.
Although moving the equipment showed initiative, the quick actions left the agency vulnerable to equipment and data losses. The outside-the-lines responses illustrate that COOP plans can’t always anticipate how emergencies play out. “Even though you’ve got a plan and you practice it, there are some tweaks that you have to make based on whatever the incident is and its severity,” Hobbs said.
Since the flood, the IRS has incorporated an emergency asset- tracking form and related training into its COOP plan.Focus on telework
IRS officials also have a renewed interest in expanded telework operations. During the headquarters shutdown, the agency worked with AT&T to establish a secure virtual private network to enable some employees to work from home offices. However, the TIGTA audit recommended that the IRS do more to develop a telework business case to help promote greater reliance on telework among department managers.
Lack of a more extensive telework infrastructure created hardships on IRS employees during the building closure, according to the audit, which reported that 1,700 staff members — almost 80 percent of the headquarters’ workforce — were on administrative leave in the week after the flood.
The audit opinion bolsters other research, including a study done for GSA by Booz Allen Hamilton that estimated a hypothetical $15.6 million telework investment at a 50,000-person agency could yield $31.1 million in potential cost benefits.
But for telework to be effective in situations such as the IRS flood, security must be a prime consideration, said Dave Jerome, a principal at Booz Allen Hamilton. “Telework isn’t a matter of just having enough bandwidth to connect everybody. You’ve got to build in security from the very beginning,” he said. “It’s much easier and cheaper to [establish] the proper security protocols upfront than it is to develop a system and then have security thrown on top of it.”
Part of telework design requires deciding which jobs are best performed from home offices and which are best accommodated by grouping people at satellite offices, he added. The advantages of common telework hotels are greater control of the work environment, including security technology and procedures, and access to office resources, such as conference rooms.
In response to the audit’s recommendations, IRS officials said they plan to “advocate the consideration and use of telecommuting” for COOP. New lessons
Current and former IRS employees said they learned other lessons beyond what the audit outlined. Downs developed a deeper appreciation for the collaboration necessary between the information technology and mission staff members for successful COOP planning.
“It was obvious during this particular situation that the [COOP plans] weren’t closely tied together,” Downs said. “If it wasn’t for us IT people knowing the [chief counsel] business as well as we did, I don’t believe we would have pulled this off.”
Hobbs said that close coordination should include commercial service providers and partners. “You have to reach out across government and industry to make sure [COOP] happens in a smooth way that is transparent to the users and clients of the organization,” he said.
Dalrymple said the IRS could have done more to ensure consistent data backup procedures. “In some instances, people hadn’t backed things up on local servers as frequently as they should have. Things like that caused some blips,” he said. “Now the IRS has an ability to back those [resources] up corporately on a regular basis.”
The emergency also reinforced the need for adequate system redundancy and procedures to quickly make the failover technology available to staff members. However, high availability can come at a steep financial price, said Shawn McCarthy, director of research for government vendor programs at Government Insights. “A lot of agencies have plans. Whether they have the funding and the manpower to enact those plans is a different situation,” he said.
To balance high availability and budgets, Jerome recommended that agencies prioritize critical functions on a continuum that ranges from resources that must never go down to those that can be idle for hours or days without harming the agency. Focus spending on “what you are required by law to provide under any type of circumstances,” Jerome said. Then protect top priorities with expensive redundant backup technology with immediate failover capabilities, he said.
Finally, agencies need to spend more time testing their existing COOP plans to guard against surprises in the midst of a real emergency, Jerome said. “Constantly make sure that people understand their responsibilities. Each time you test the plan you are going to find things that didn’t work exactly the way that you thought they would. COOP is a living plan that has to be updated on a periodic basis.” Joch is a business and technology writer based in New England. He can be reached at [email protected].