Labs explore Linux super clusters

Clustering's cost advantage has compelled some organizations to migrate from Unix-based systems and symmetrical multiprocessing machines. Hewlett-Packard Co.'s Alpha machines (originally from Digital Equipment Corp.) and Silicon Graphics Inc.'s (SGI) boxes are among those being exchanged for clusters.

"They're taking applications that in the past needed big iron to run — SGI, Alpha or whoever — and moving those down," said Paul Barker, vice president of marketing at RLX Technologies Inc., a Linux cluster software vendor.

Linux vendors are hoping to tap into this movement. Red Hat Inc., for example, aims to build a "targeted" Linux distribution adaptable to high-performance computing, according to Brian Stevens, Red Hat's vice president of operating system development.

The national labs are among the key stomping grounds for Linux clustering specialists. "The labs are a good place to be," said Tom Leinberger, vice president of sales at Aspen Systems Inc., which manufactures Linux-based clusters. But Aspen's strategy is to target smaller facilities within such organizations as the National Institutes of Health and the National Institute of Standards and Technology.

Leinberger said the company can assist smaller groups that don't have large numbers of in-house technicians to work on a clustering project.

Nevertheless, the labs are showplaces for high-performance computing models. Linux machines and traditional supercomputers are both put to the test at the Center for Computational Sciences at the Energy Department's Oakridge National Laboratory. The center provides a computing test bed for DOE and university scientists.

Oakridge recently contracted with SGI for a 256-processor system based on Intel Corp.'s 64-bit Itanium 2 chip. SGI's Altix 3000 runs Linux.

Initially, SGI will deploy a cluster of four Altix machines, each with 64 processors. That's because SGI currently delivers 64 processors within a single system, although the company expects to scale its Altix machine to 256 processors, noted Buddy Bland, director of operations at the center.

Bland said there are pros and cons with both the traditional supercomputing model and clustering. He said he finds that clustering offers a large degree of fault tolerance. On the downside, the "burden of managing all of the nodes individually is a bigger cost," he said.

Bland said DOE and Oakridge officials have been working on a number of projects to lower the cost of cluster system management.

Featured

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above