Storage efficiency products make the most of available disk space while offering additional benefits

By Cara Garretson

While disk-drive technology continues to make advances, eventually the realization hits that there is only so much data that can fit on drives. Taking a new approach to the problem of where to store data, a number of technologies have emerged to make the most of available disk space while having little if any impact on performance. Referred to as storage efficiency products, these technologies reduce the amount of data that needs to be stored using techniques such as deduplication, cloning, replication, and thin provisioning, resulting in cost savings and reduced administration of storage.

Such technologies have become particularly useful to federal agencies that carry the burden of being the official record.

“As part of their primary mission, whether civilian or defense, the government must be the official record; the citizenry demands that official records be maintained, so that forces the government to retain much more information than the average business,” says Mike O’Donnell, vice president of EMC’s Data Domain back-up and recovery systems division. “The path of drives getting bigger and spinning faster is not keeping up with demand.”

Deduplicating Data
One of these storage-efficiency technologies is data deduplication, which removes redundancies from files so that the amount of data stored is reduced. With products based on intellectual property developed by former Princeton professor Kai Li in 2001, Data Domain of Santa Clara, Calif., was an early pioneer of the technology. The company was acquired by storage giant EMC in 2009.

Data Domain’s deduplication technology works at the bit level to remove duplicates found in a file, using fingerprinting to achieve dramatic reductions, says O’Donnell. “Deduplication is a technology that’s not focused on that kind of insanity (of trying to squeeze more space from disk drives), instead it reduces the amount of data from its source,” says O’Donnell. “We’re the guys who put 100TB in a 5TB bag.”

According to market researcher The Enterprise Strategy Group (ESG), this reduction of required disk translates into significant cost savings.

“In theory, with data deduplication solutions, companies backing up a 10TB file system may only need 1TB of media,” reads an ESG white paper published in April of 2009. “When a customer can reduce its backup capacity by 90 percent, the savings add up fast - and this is why ESG believes companies will increase the amount of backup data on disk in the next few years.”

By using Data Domain’s technology, which works in-line, in real time, organizations can gain significant efficiencies in the physical amount of data that needs to be transferred over the network and maintained in deep archives, says O’Donnell. So for example, if a manager sends out a PowerPoint presentation to his staff, then realizes he forgot a word in Slide 1 so he sends the entire presentation out again, only the changed portion of the presentation that was sent the second time is saved – that one word.

“By focusing on the data, we actually change the equation – now we’re reducing data even before it has to be stored,” he says.

Added Benefits
Having less data on disk means less storage to manage. And since Data Domain’s technology works inline, in real time, the result is less bandwidth utilization when transferring data across the network as well, says O’Donnell.

There’s also a data-protection benefit from the technology, one that the U.S. Army was able to realize by implementing Data Domain’s DDX Arrays and Replicator software. The U.S. Army’s Southwest Asia data center team deployed these products, along with TCP acceleration and forward error correction technology from Juniper Networks, to ensure continuous integrity checking and instantaneous backup verification to support communications for more than 100,000 troops in the Middle East.

“Although data deduplication is still a relatively new capability, customers are deploying it in both data centers and remote offices, indicating that the benefits are real,” reads the ESG white paper.

Other Approaches
Data deduplication is one of a handful of ways to maximize storage space. There’s also data cloning, which makes a clone of a database – particularly helpful if multiple copies of a database are required – but only takes up the space of the one copy plus changes. Replication is another technique, which makes copies or ‘snaps’ of data but only transfers changed blocks of data once the baseline copy is made. And thin provisioning allows administrators to improve storage utilization by maintaining a common pool of free storage that’s accessible to all applications, but only used on an as-needed basis.