Continuity of Operations

Should you trust disaster recovery to the cloud?

Irene

Natural and man-made disasters -- this is a satellite image of Hurricane Irene making landfall in New York in 2011 -- can be devastating to stored data. Cloud computing can offer a useful disaster recovery solution. (Image: NASA)

Implementing a disaster recovery plan can be like eating vegetables, getting enough fiber and sleeping at least eight hours a night. Most people understand why these things are important, but few do them religiously.

The problem is that traditional disaster recovery methods call for re-creating the full IT environment at a separate off-site facility to keep agencies safe from unplanned IT outages. The investment in redundant resources pays off if a server gets fried, some stealthy malware takes down a storage system, or a hurricane forces a data center evacuation.

But on most days, when disasters don’t strike, all that duplicate hardware and software are running in standby mode and not contributing meaningfully to the agency’s daily operations. That is a tough expense to justify, particularly in times of tight IT budgets.

And so a growing number of IT managers are considering a way to change the equation: cloud-based disaster recovery, also known as DR-as-a-service (DRaaS). With this option, agencies subscribe to a third-party cloud service to avoid the upfront costs of buying, installing and managing the necessary hardware and software. Instead, they pay a monthly fee for storing duplicate copies of data and applications at an off-site location.

“You’re only going to pay for what you need rather than for an entire duplicate of everything that’s sitting idle waiting for a disaster,” said Chuck Riddle, CIO at the Government Printing Office. He said his department is actively evaluating cloud-based disaster recovery but has not made the move yet. “Done correctly, it opens up a lot of options for doing disaster recovery better than in the past, but the devil’s always in the details when it comes to how you actually move forward.”

Why it matters

Because disaster recovery investments have been difficult to justify, some organizations have attempted to do it on the cheap, said Rachel Dines, a senior analyst at Forrester Research. For example, they might buy only enough duplicate resources to protect mission-critical applications, leaving second-tier but still valuable systems vulnerable to extended outages.

But the economies of scale offered by clouds could mitigate those trade-offs. New data from Forrester shows an increasing interest in cloud solutions for disaster recovery. The firm approached IT managers whose organizations have already adopted infrastructure as a service and asked how much the access to improved disaster recovery had factored into their decision. Almost half said it was very important, and another 28 percent ranked it high on the importance scale, Dines said.

Paying only for the resources you need — and only when you need them — is not the only appeal, analysts say. Another potential benefit is faster recovery times. The classic benchmarks of effectiveness are recovery time objectives (RTOs) and recovery point objectives (RPOs). The former is an estimate of how fast critical resources will be returned to normal after a disaster, while the latter defines the point from which data will be restored — for example, when the failure occurred, or as of the previous night’s backup.

“Many of the clients we talk to who are interested in recovery as a service are looking for improvement in their RTOs and RPOs,” said Kevin Knox, a research director at Gartner.

DRaaS can also help IT managers sleep better at night because regular testing is written into the solution’s service-level agreement (SLA). By contrast, testing can fall through the cracks in traditional environments because it disrupts daily operations, Riddle said.

But IT managers must weigh a number of pros and cons when they consider DRaaS. “DR in a cloud is by no means a slam dunk,” said Yogesh Khanna, vice president and chief technology officer of IT infrastructure solutions for CSC’s North American Public Sector.

One of the biggest challenges remains the lack of industry standards regarding what deliverables should be included in a DRaaS package. “Because the space is still very new, I wouldn’t take anything for granted when you are negotiating SLAs,” Dines said.

Another potential stumbling block is the need to sort out complex interconnections in existing IT systems before duplicating them in the cloud. “Sometimes it’s not clear what all the interdependencies are for applications you’ve been running for the last 20 years,” Riddle said.

The fundamentals

What should you consider before trusting the cloud for disaster recovery? The first step is deciding on the right cloud model — public, private or a hybrid of the two. Moving to a public cloud service is best for agencies that have relatively homogeneous infrastructures — namely, virtualized x86 servers rather than a mix of Unix and mainframe servers, Knox said.

IT organizations with mixed platforms should consider a private or hybrid cloud strategy instead. “In larger enterprises, people aren’t asking, ‘How am I going to recover my mainframe in the cloud?’” he said. “The more heterogeneous the environment, the more complex [disaster recovery] gets because of different types of hardware and platforms, recovery times, recovery points, and tiers of applications.”

Technological diversity is not the only consideration. Agencies should also carefully evaluate the kind of data they might be sending to the cloud, Khanna said. For security reasons, mission-critical applications or those that hold classified data should remain in a private cloud or a shared government cloud. Less critical resources could be protected by a public DRaaS solution.

Questions for providers

A lot rides on disaster recovery. Ask these questions before signing a contract.

1. Is the service provider certified under the Federal Risk and Authorization Management Program?

2. What are the service provider’s financial condition, track record and length of time in the market?

3. Where will my data be physically stored when it is in the cloud, and will that conflict with any of my agency’s internal policies or federal regulations?

4. Will the recovery site be far enough away from the production facility that both won’t be affected by the same regional disaster?

5. What penalties will result if the service provider fails to meet the recovery time and recovery point objectives spelled out in the service-level agreement?

6. What are the base fees for data replication in a non-disaster situation, what additional fees will arise during a recovery, and will these “declaration” charges be one-time or daily?

7. Are the wide-area network connections to the cloud sufficient to ensure adequate performance when sending data between the main and backup facilities?

8. How frequently will the service provider conduct tests of the disaster recovery capabilities, and what will be the agency’s role and responsibilities during testing?

“Not all applications and data are classified or top secret — even in intelligence agencies and the [Defense Department],” Khanna said. “So they absolutely could go into a public cloud.”

Other security considerations stem from how data will be protected as it is being transferred to and from the recovery site, and while it is housed in the cloud. Encryption and two-factor access controls are a must, he said.

Khanna also said agencies should decide what RTOs each application requires and let that guide deployment decisions. “If I go to a public cloud, I may be riding on a public infrastructure and whatever SLA I can negotiate,” he said. “So I may get better RTOs from a private cloud.”

The hurdles

Planning and a needs analysis alone won’t guarantee success, experts say. IT managers should also prepare for some common challenges associated with DRaaS.

Fees can be a shock if they’re not clearly defined during the SLA negotiation process. Analysts said many DRaaS solutions charge a basic monthly fee to cover daily data replications and the cloud resources necessary to prepare for a disaster. But agencies should also be prepared for additional, so-called declaration fees, the costs that kick in when a customer “declares” that a crisis is unfolding and recovery mode is launched. Declaration fees might be levied for each day the agency is in recovery mode.

Other pricing confusion comes about because some service providers use their own models rather than an industry-accepted standard. For example, one provider might set prices according to the number of virtual machines being protected, while another might use the number of processors as the benchmark.

“It’s been hard to make apples-to-apples comparisons,” Knox said.

Fortunately, there are signs that the situation is changing. A recent industry trend is to base pricing on a combination of connection costs, memory, disk space and the number of virtual machines. “We are starting to see some standardization around those four core areas for pricing,” Knox said.

Another potential snag: Cloud providers frequently oversubscribe their services by signing up more customers than can be accommodated if disaster strikes them all at the same time. That approach is not inherently bad, Dines said, because it helps bring down subscription costs. But agencies should question a potential service provider about how it will keep from becoming overwhelmed.

"I would ask what safeguards they have put in place to make sure that there will never be resource conflicts at time of declaration,” she said. “That might be as simple as making sure that they’ve got customers from a wide geographic range so it’s unlikely that they’d all be declaring at the same time.”

Finally, agencies should avoid the temptation to view DRaaS as a set-it-and-forget solution.

“I’ve met organizations that say, ‘I’m sending DR to the cloud; I’m not going to think about it again,’” Dines said. “I’ve seen organizations lose focus because they’ve moved DR to the cloud.”

But even with a cloud solution, agencies must continue to perform all the associated duties that go along with a disaster recovery program, including conducting business impact assessments, risk analyses and tests with internal staff.

Some vegetables you just can’t avoid eating.

The 2014 Federal 100

Get to know the 100 women and men honored this year for going above and beyond in federal IT.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above