Don't let downtime get you down
- By John Moore
- Jun 16, 2003
Sometimes the hardest problem to identify is the one staring you in the face.
And so it goes with business continuity. Information technology organizations gird themselves for all manner of disasters — flood, fire and terrorism to name a few. But a much more commonplace threat exists in the form of backup sessions, hardware/software upgrades and storage capacity changes.
Indeed, planned downtime accounts for the majority of system and network outages. Yet planned downtime is often overlooked, while IT managers parry the various horsemen of the data center apocalypse.
Dave Purdy, director of business continuity at EMC Corp., said 87 percent of downtime is associated with planned outages, and 13 percent stems from unplanned events. Of that 13 percent, disasters cause less than 1 percent of downtime, he said. He bases those numbers on various market studies and EMC's own research.
"The fraction of 1 percent has dominated headlines" since the terrorist attacks of Sept. 11, 2001, Purdy said. But the reduction of planned downtime carries the biggest payback, he added.
Organizations have a number of paths for reaching that potential payback. Disk mirroring creates redundancies that shrink planned outages and protect against headline-grabbing disasters. Switched-fabric architectures, such as those found in storage-area networks (SANs) and emerging technologies such as InfiniBand, play a role as well. Specific features in storage devices, such as upgradable firmware in SAN switches, also have the ability to reduce planned downtime.
For some government applications, any variety of downtime has become unacceptable. Public safety systems and e-government initiatives have made government an around-the-clock enterprise. Accordingly, the Office of Management and Budget looks for a continuity-of-operations strategy when it reviews an agency's IT project business plan.
"Agencies are being asked to explain how they deliver on their commitment to the end-user community," said Howard Stern, senior vice president at Federal Sources Inc.
A Mirror Image
Mirroring, in which an organization continuously replicates data from one disk array to another, can be enlisted to trim planned downtime. Data is replicated from a primary production site (server and storage) to a secondary target site.
This approach lets organizations perform upgrades or maintenance on the production side, while the mirrored site handles the workload.
The Los Angeles Unified School District started mirroring data after a drill two years ago found holes in the organization's recovery plan. "Everything is backed up on a mirror-image server instantaneously, all the time," said Donald Davis, the school district's associate general counsel. Los Angeles operates four servers in its production environment and mirrors to four servers.
In addition to mirroring, the school system backs up to tape every day and has a tertiary level of backup in EVault Inc. EVault provides data protection and recovery services from its company- managed data centers.
In the government arena, database consolidation is sparking increased interest in mirroring. Purdy said consolidation increases the value of the information and makes planned downtime a more costly proposition. Agencies, he noted, have "become interested in replicating" to one or more sites.
That's the case for the Army's Civilian Personnel Regionalization project office, which this month plans to consolidate nine personnel databases into one. After consolidation, the office's next step will be to mirror the database via a SAN to a continuous operations site, said Sam Chung, a systems engineer with the Army's Program Executive Office for Enterprise Information Systems.
The Army is using Network Appliance Inc. (NetApp) for its SAN mirroring chores. Chung said hardware or operating system upgrades are no problem in such an environment.
On the storage subsystem side, EMC, Hitachi Data Systems and Storage Technology Corp. (StorageTek) also offer mirroring products. Mirroring can also reside in the SAN fabric, as in products such as FalconStor Software Inc.'s IPStor network storage software.
Replication may also take place at the application level or for general-purpose applications and files at the host level, said Bob Guilbert, vice president of marketing and business development at NSI Software. NSI's Double-Take solution represents the latter category. The product takes advantage of a customer's existing IP network, as opposed to more expensive fiber cabling used in some mirroring solutions. In addition, Double-Take replicates data changes at the byte level, rather than the block level, so mirroring takes less bandwidth.
The redundant systems required with mirroring can be expensive. An alternative to mirroring is what vendors call a snapshot. A snapshot is a virtual image of a dataset as it existed at a point in time, as opposed to an exact, physical copy of the whole collection, Purdy said.
Thus, snapshots are easier on storage capacity. Snapshots can be used to ease the downtime caused by backups. This technique lets organizations "use disk technology to create an image instantly and do backup off-line so it doesn't interfere with daily operations," according to Purdy.
EMC's TimeFinder performs the snapshot chore for its Symmetrix storage arrays, and SnapView handles the same task for Clariion devices. Many other vendors including NetApp also offer snapshot products.
Jay Desai, senior manager of data protection and business continuity at NetApp, said he sees considerable activity with snapshot products in the price- sensitive market.
In the Army's case, SANs are in high-availability situations. Organizations interested in deploying SANs in such a role can take steps to ensure their subnets are up to the task. One approach is to create a dual-fabric SAN in which two host bus adapters are installed on each server in the SAN. This method provides two paths between servers and storage devices.
The approach is catching on in the government sector. Derek Granath, director of product marketing at Brocade, said government entities implementing the company's SAN switches "almost without exception have dual-fabric environments."
The Air Force's 45th Space Wing is taking that approach. The 45th Space Wing operates clustered servers and dual- fabric SANs at Patrick Air Force Base at Cape Canaveral, Fla. The clusters maintain continuity of service in the case of server failure, and the SANs facilitate backup, said Glenn Exline, manager of IT development at Patrick Air Force Base. The SANs have shrunk a 12-hour backup window to two-and-a-half hours.
But the 45th Space Wing is looking for additional capabilities that could further ease downtime. "What we are working toward is the ability to do cross-site data replication and cross-site data backups," Exline said.
The Air Force SAN environment includes Brocade's SilkWorm 3800 switches, Computer Associates International Inc.'s BrightStor SAN Manager and Enterprise Backup, and EMC's Clariion storage arrays.
Behind the Scenes
Individual products working within the broader architectures also contribute to continuous operation. Product features such as upgradable firmware reduce planned downtime. This capability enables organizations to upgrade the software embedded in SAN switches, for example, without disrupting operations. Brocade's Granath said upgradable firmware has become a factor in purchasing decisions. Brocade's Silkworm 12000 director-class switch is a case in point. Until recently, a software upgrade for this product would result in a 20- to 30-second interruption of data flow through the switch. Granath said this lag — albeit brief — was an issue for customers who wanted to "load code nondisruptively."
The latest version of Silkworm 12000, however, allows users to upgrade systems without taking them off-line.
Moore is a freelance writer based in Syracuse, N.Y.
Store it in a vault
One way to deal with downtime issues is to have someone else worry about them.
Companies such as AmeriVault, EVault Inc. and LiveVault Corp. offer to perform backup and recovery chores at their sites. The Naval Reserve Association opted for the off-site approach, turning over backup duty to AmeriVault.
"We went to this kind of system so we didn't have to worry about" backup, said Bob Lyman, the association's chief financial officer. "It's completely hands-off."
Having an outside firm handle backup functions was essential because the association has only three full-time technology employees, according to Lyman.