Forensics meets time travel

In backup they trust. Many organizations have depended on a fail-safe layer of storage technology to recover information in the event of data loss or corruption, and commercial solutions abound, with offerings that stretch from enterprise servers to individual systems. Since last September's terrorist attacks, government agencies and corporations have invested more in this area because they realize how much is at stake.

But in this new era of homeland security, some data loss is likely to occur from malicious attacks, and some researchers believe current backup technology cannot provide the necessary level of assurance.

A June report from the National Academies identifies data backup and reconstitution as an area ripe for research.

Most "normal" backup methods — usually involving storage on tape or disk — were developed under the assumption of benign and uncorrelated failure.

However, in the wake of a malicious attack, so-called reconstitution requires a decontamination step that distinguishes between the damaged and "clean" portions of a system, according to the National Academies report.

The key issue is determining when data was corrupted and restoring the most recent backup files created before that point. It's a delicate task that calls for a mix of forensics and time travel. Today's backup technology offers help in this regard, but further refinements are under development. Other related research topics include system survivability and the backup needs of telecommuters.

Several initiatives aim to improve the security of key national infrastructures, such as electrical utilities. The Idaho National Engineering and Environmental Laboratory, for example, recently unveiled its plans for the Critical Infrastructure Protection Test Station.

The station will explore the recovery and reconstitution of attacked infrastructures, said Steve Fernandez, manager of the Infrastructure Protection Systems division at the Idaho lab. The lab operates a sizable power grid that researchers will use to locate vulnerabilities and test countermeasures.

Work on the test station is scheduled to begin in fiscal 2003, Fernandez said.

Other efforts hit closer to home. Larry Rogers, a senior member of the technical staff at Carnegie Mellon University's CERT Coordination Center, said backup and data integrity issues "extend to people at home and can be compounded by more people working at home."

A telecommuter, he said, could introduce corrupt files into the main office's work environment. Rogers added that he has just started looking into telecommuters' effects on security systems as a research subject.

The National Academies' report, "Making the Nation Safer: The Role of Science and Technology in Countering Terrorism," puts this challenge in a five- to nine-year research time frame. But many efforts are under way to get ever closer to the target solutions.

Where We Are Now

The task of recovering data from a corrupted system requires two elements. First, an organization must determine when the attack occurred.

"One of the biggest challenges you find is that [data corruption] can go undetected for a period of time" that could span months, said Marco Coulter, vice president of storage solutions at Computer Associates International Inc. (CA), which sells backup software.

"Attackers who really want to do a lot of damage, and be creative in how they do that, may try to slip in corruption in a way that it is not detected for a long period of time," said Kevin Walter, vice president of product management for information protection at Legato Systems Inc., another backup vendor.

Storage vendors offer several tools to help isolate the problem. CA, for example, offers a file change recorder with its BrightStor High Availability products. The utility constantly tracks changes to files, creating a log of sorts that can help organizations determine when an event occurred.

Antivirus software and intrusion-detection systems may also come into play. "From a different perspective, security, you need distinct audit logs so that you can go back and ask, 'At what time did this all start?'" Coulter said.

Once the time frame is established, the second element of recovery kicks in: a series of backups conducted over time.

"You need to have what we call a line of images," Coulter said. An organization that knows when an event occurred and maintains such a collection of images can turn back the clock to a clean slate. Storage executives refer to this approach as "point-in-time recovery."

An agency may achieve point-in-time recovery through a series of tape backups.

Complete backups may occur weekly or monthly, with incremental daily backups to provide a base level of protection, Walter said. This approach, however, could result in hours or even days of data loss.

Organizations that "can live with a 24-hour data loss may stick with traditional backup," Walter said. But for those that cannot deal with much data loss, a technique known as snapshotting could be the answer.

A snapshot is an efficient way to keep previous versions of files on hand by tracking and recording only how the data changed over time. Snapshots of data can be taken more frequently — and less invasively — than full backups.

Legato's NetWorker PowerSnap for Network Appliance Inc. (NetApp) products enables administrators to establish a policy on how frequently snapshots are taken — as often as once every 30 minutes.

If something happens, "you can do a roll-back recovery to that last good snapshot," Walter said. The product works on Oracle Corp. databases saved on NetApp filer storage appliances.

A number of vendors offer snapshot products, including EMC Corp. at the enterprise level, and Symantec Corp., which offers Ghost 7.0 for incremental PC backups.

Snapshots, although useful, have their limitations. Michelle Butler, technical program manager for the storage-enabling technologies group at the National Center for Supercomputing Applications, said the technology she has encountered performs snapshots for individual storage volumes. But large file systems may need to span multiple volumes, and "file systems are growing astronomically," she said.

What's Next?

Members of industry, government and academia are researching the next developments in backup technology.

The Defense Advanced Research Projects Agency is working on an innovative approach using the concept of self-healing systems (see box, Page S20). CA's vision is to tighten the links between backup, storage resource management and security components. This will provide more efficient backup and make it more likely that organizations will be able to "cleanse" their data, Coulter said.

According to Coulter, integrated backup and storage resource management distributes data to the most appropriate and cost-effective storage medium: Data that requires fast recovery is routed to disk backup, while less important data goes to tape.

Better backup-to-security integration, meanwhile, will help organizations more readily determine the time of data corruption.

Legato's research aims to integrate its Automated Availability Manager for Microsoft Corp. Exchange with virus-detection software, Walter said. This integration will enable Legato's product to "automatically detect certain scenarios" — such as viruses — "and respond to them in a programmatic way."

Legato offers virus-detection integration on a custom basis through its professional services arm. "In the future, we would look to provide some turnkey integration," Walter said.

Beyond product integration, technologists hope to fine-tune the specific task of locating damaged data in time and space.

In cases of intentional damage, "recovering to a clean state has been a 'completely repaint the room' process instead of just painting over the stretch," Coulter said. Industry isn't currently equipped to "discover what a person touched and redo the files. That would take too long."

Today, point-in-time recovery means rebuilding an entire file system. But CA's research and development aims to pinpoint and recover a specific block or file that has been altered, Coulter says.

Granularity — a measure of system flexibility — is also central to the National Academies' take on backup. The group's report envisions a process of "distinguishing clean system state (unaffected by the intruder) from the portions of infected system state, and eliminating the causes of those differences."

Meanwhile, the research continues — as does the growth of file systems requiring backup.

Moore is a freelance writer based in Syracuse, N.Y.

NEXT STORY: DISA upgrading presidential comms