Center explores upper reaches of SANs

Fibre Channel storage network helps supercomputing center streamline operations

With hundreds of terabytes of storage capacity already spinning and more coming online soon, Michelle Butler has a data storage job on her hands that might give fellow federal information technology managers nightmares.

Before you start to feel sorry for her, also know that she's got the green light to build a solution using the latest storage-area network (SAN) technology that might make that budding sympathy turn to envy.

A SAN allows IT managers to pool storage resources on a dedicated high-speed network and make the whole thing available to multiple servers, as opposed to the traditional method of hanging individual storage devices on the backs of individual servers, the so-called direct attached model.

While many IT managers can only dream about what a Fibre Channel SAN could do to streamline and accelerate their enterprise storage systems' performance, Butler and her colleagues at the National Center for Supercomputing Applications (NCSA) in Urbana-Champaign, Ill., are finding out firsthand.

In the year that NCSA has had its initial SAN deployed, IT managers there have learned a lot about this new storage architecture. Those lessons will come in handy as the center starts building its piece of a giant distributed supercomputer called the TeraGrid that will require a SAN four times bigger than NCSA's current storage network — and that's just to get started.

"We had to get our feet wet to be able to build this TeraGrid right," said Butler, technical program manager at NCSA.

Need for Speed

NCSA's move to SANs is happening for a few reasons. Certainly at the top of the list are the demands of the TeraGrid system, which is designed to pool banks of computers across four national research centers. The TeraGrid infrastructure at NCSA will initially consist of 256 IBM Corp. dual-processor servers running Linux on Intel Corp.'s 64-bit Itanium 2 processors. Next year, the grid will grow to more than 1,000 servers and 230 terabytes of storage.

TeraGrid's processing performance — which directly affects how long it takes researchers to complete their scientific computing jobs — depends in part on how fast the servers can write and retrieve data from disk storage.

"The speed of the supercomputer is amazing, so why have the chips sitting there doing nothing but an [input/output] wait?" Butler said.

On this score, Fibre Channel has a decided edge over other technologies. Butler said NCSA's current SAN — a test bed of sorts for the SAN that will be built for the TeraGrid — is delivering throughput rates of 90M per second per port, more than two to four times faster than their older direct attached storage devices that use the SCSI protocol.

That advantage alone justifies the premium price for Fibre Channel network switch and storage arrays compared with SCSI equipment's cost, Butler said.

Another factor that supported a move to SANs was the cost of maintaining NCSA's older disk systems. "It's not so much our administrative costs for the person, because even with the SAN someone still needs to take care of the disks," Butler said. "The real killer is to just keep the older systems running. [Vendors] want about $90,000 to keep these old disks spinning, when I could spend that same money and buy three times as much new disk storage as what's spinning without the expensive maintenance."

Still another reason was that by creating one pool of storage in a SAN that multiple servers and applications share, NCSA officials can negotiate lower prices with vendors by buying new SAN storage in bulk, confident that the new capacity will be used.

"We were tired of buying lots of little chunks of storage to hook up locally to every system and not be able to move it around once we needed a little more here and a little less there," Butler said.

NCSA's initial SAN, installed in January, is connected to more than 100 servers — increasing soon to more than 200 — that run an assortment of research and general business applications.

Not only does the SAN help NCSA researchers work faster, it's improving the performance of the center's general business applications, Butler said. For example, data backups are finished much quicker and performance has improved significantly for the center's Microsoft Corp. Windows NT-based Microsoft Exchange e-mail system.

One of the selling points of SANs is their scalability and support for equipment from different vendors. Although NCSA theoretically could expand its initial SAN to include the storage resources the TeraGrid needs, Butler wants that system to have its own SAN as a way to simplify system management and minimize interoperability issues.

At the same time, Butler said the center will probably use disk arrays from only a few different vendors, even within the same SAN. "Keeping track of all the vendors' [product] management environments would be just too hard," she said.

Indeed, half of the IT managers at mid- to large-size corporate and government organizations surveyed by the Aberdeen Group, a Boston-based consulting firm, said they had more difficulty than they expected getting SAN components to work together, according to Dan Tanner, Aberdeen's director of storage research, storage and storage management.

So NCSA is not alone in playing it safe with regard to the number of products it's plugging into its SAN, but that might change as the market matures, according to Steve Beer, director of product marketing at Brocade Communications Systems Inc., which provides the Fibre Channel SAN switches NCSA uses.

"We're now seeing some transition to the next phase of SAN deployment, where heterogeneous SAN storage and components are the norm rather than the exception," Beer said.

***

NCSA's storage-area network:

Solution: The initial storage-area network (SAN) consists of Fibre Channel network gear from Brocade Communications Systems Inc., including eight SilkWorm 3800 16-port switches and a SilkWorm 12000 Core Fabric switch. It also includes seven DataDirect Networks Inc. Silicon Storage Appliance disk arrays, providing 60 terabytes' capacity. QLogic Corp.'s SANblade 2200 and 2300 Fibre Channel host bus adaptors connect servers to the SAN.

Cost: About $1.5 million

NEXT STORY: Catching up with TSP, and more