Super challenger

Supercomputing, once the domain of largescale systems specifically designed to crunch numbers for scientific applications, is becoming more mainstream. As an alternative to the highly specialized supercomputer system architectures, vendors such as Silicon Graphics Inc. (SGI) and IBM Corp. are deve

Supercomputing, once the domain of large-scale systems specifically designed to crunch numbers for scientific applications, is becoming more mainstream.

As an alternative to the highly specialized supercomputer system architectures, vendors such as Silicon Graphics Inc. (SGI) and IBM Corp. are developing high-end systems by using commercial symmetric multiprocessing (SMP) as a building block and offering these lower-cost systems to business and scientific users. Basically, these vendors set up SMP systems in clusters, each of which deploys large numbers of CPUs working simultaneously.

At the same time, however, it is becoming more common to see scientists with small budgets building their own clusters from off-the-shelf parts as clustering technology rapidly becomes more sophisticated and easier to use.

For example, a team at Los Alamos National Laboratory assembled a 140-processor cluster based on Digital Equipment Corp. Alpha processors. Called Avalon, this system has performed at 47 billion floating-point operations per second (Gflops), according to its developer, Michael Warren. With a cluster of 68 PCs last summer, Avalon was already among the 500 fastest computers in the world. Warren, a Los Alamos staff member, said the lab spent $300,000 for processing power that would cost $3 million if commercially packaged.

Avalon uses 3Com Corp. Fast Ethernet switches and runs the Linux operating system, which is a version of Unix that can be downloaded from the Internet at no cost.

Energy-Driven

The clustered SMP approach has been accelerated by the Energy Department's $4 billion Accelerated Strategic Computing Initiative. DOE started ASCI as a way to foster the development of a new generation of supercomputing technology capable of handling the complex calculations required for applications such as simulations of underground nuclear tests.

In one near-term product of ASCI, SGI will provide Los Alamos with its Blue Mountain configuration: a cluster of 48 Origin 2000 computers, each with 128 CPUs. This system, with its 6,144 processors, is intended to reach a peak performance of 3 trillion floating-point operations per second (Tflops), according to John Reynders, ASCI's senior project manager for computational science at the laboratory.

In another case, IBM this year expects to provide Lawrence Livermore National Laboratory with a 4 Tflops ASCI system called Pacific Blue, based on the company's RS/6000 SP parallel machine. Dave Turek, IBM's director of SP brand management in Poughkeepsie, N.Y., said the company also was chosen to supply to the laboratory in the Year 2000 a machine targeted at 10 Tflops. The SP machine scales up to a 512-way system.

Turek stressed the similarity between science and business problems. Today some business customers have databases running up to tens or hundreds of terabytes and requiring the same kind of processing power as science applications, he noted. And he said business and science users are interested in applications such as data mining, in which a system searches for "nonintuitive connections or patterns" in data.

And there seems to be little immediate limit to the scalability of the clustered SMP approach. Reynders said ASCI set out to accelerate growth "beyond Moore's law," a tenet stating that the amount of data storage that a microchip can hold doubles every 18 months. The ASCI program calls for a system capable of processing 30 terabytes/sec by the end of 2001 and 100 terabytes/sec by 2004.

A supercomputer should run more than 100 Gflops today and "at least a [teraflops] in a year or so," said Stephen Elbert, program director for the National Science Foundation's Partnership for Advanced Computational Infrastructure program.

Build-It-Yourself Supercomputers

Although observers have noted that one DOE effort may require up to $500 million a year for nonclassified research using supercomputers, some researchers are looking to reduce their need for such extensive funding. These thrifty scientists are snapping together lower-budget Windows NT and Unix workstations with off-the-shelf interconnections.

The "personal supercomputer" phenomenon is likely to grow in importance because of market economics, said Andrew Chien, a professor at the University of California at San Diego. Chien, known for building a 256-processor "NT supercluster" at the National Computational Science Alliance (NCSA), said he expects that cluster to scale up to 512 workstations in the next three to four months.

Chien estimated that the NT supercluster achieves half the performance per processor of an SGI Origin 2000 at one-fifth to one-tenth the cost. He said he expects professional integrators eventually to start to assemble commodity clusters as well, making low-cost, high-performance computing available to a wider sector.

Chien and fellow researchers at UC San Diego are now building a Windows NT cluster that mixes Intel and Alpha processors. Chien said researchers need to develop an understanding of "heterogeneous clusters" that include different types of processors because of the difficulty of maintaining the clusters over time. He said the study is important because new processors are constantly emerging and replacing older ones, and users inevitably will end up with mixed clusters. The Intel/Alpha cluster will focus on "data-intensive computing," such as terrain mapping and data mining, he said.

Unlike some observers, Chien believes commodity clusters can scale higher than current levels— well beyond 2,000 nodes. "We don't know of any fundamental limits," he said.

Bill Saphir, a staff scientist at Lawrence Berkeley Laboratories, said commodity clusters will become competitive for applications besides massive number-crunching. He said a 32-node system built at Lawrence Berkeley Labs is being used in part for developing software to run on the lab's SGI/Cray Research T3E, which is a more traditional supercomputer that runs more than 640 processors.

He estimated that PC clusters cost $5,000 per node, as opposed to the T3E, which costs about $30,000 per node.

Problems at the High End?

Clustered computers based on large SMP machines are more affordable and reliable than the exotic architectures of earlier years, but they may be less capable of solving certain high-end problems efficiently. And the industry innovators may not deliver new technology in time to meet emerging needs, particularly at the high end of the market.

"The market is in pain right now," said Deborah Goldfarb, vice president of workstations and high-performance systems for International Data Corp., Framingham, Mass.

Goldfarb said the overall supercomputing market is growing, but it is driven by tremendous growth in midrange computing, averaging 16 to 32 processors. As a result, vendors tend to invest in development projects that support high-volume product users rather than the unique needs of the top segment of technical computer users.

As SGI "phases out" its legacy T-90 vector computer— the traditional supercomputer technology developed by its Cray subsidiary— "the [outlook for the] high end is not bright at all," Goldfarb said. The company's SV2 scalable node product— basically an Origin machine with "vector characteristics" via hardware enhancements— is not expected until 2002.

Richard Partridge, vice president of parallel open-systems hardware for D.H. Brown Associates, Port Chester, N.Y., was more optimistic. Although the "downturn" of the market for high-priced systems resulting from shrinking government spending has "affected the viability of special-purpose computing," the overall market for machines that meet technical and general-purpose demand "seems to be successful [and] solid," he said.

-- Adams is a free-lance writer based in Alexandria, Va. She can be reached at cbadams@erols.com.

***

The Tera Alternative

The clustered SMP approach has not convinced everyone. Companies such as Tera Computer Co. are trying to apply vector-like approaches to obtain higher processor utilization, smoother scalability and simpler programming.

Vector processors are specialized to perform many simultaneous calculations. To explain vector computing, Mike Vildibill, associate director of the San Diego Supercomputer Center (SDSC), drew an analogy to the task of getting 20 people up a ladder. A conventional processor would put them up one at a time, but a vector computer would line up operations so that all of the people simultaneously take a step up the ladder, getting them all up a lot faster.

SDSC, with funding from the Defense Advanced Research Projects Agency and the Energy Department, is looking at a unique approach by Tera. That approach promises to achieve vector-like processing efficiency while greatly extending the scalability that vector machines are able to achieve. The technology, however, is still unproved, and SDSC has only a two-processor machine, although a four-processor machine is the target, according to Wayne Pfeiffer, the center's deputy director.

Dick Russell, Tera's vice president of marketing in Seattle, explained that Tera's Multi-Threaded Architecture (MTA) uses complex, proprietary processors, each with up to 128 "hardware threads." Consequently, some threads can keep busy while others wait for data from memory. "It's like having the logic of 128 microprocessors side by side in one processor," he said. Although processor clock speed is a mere 255 MHz, each processor is comparable in performance to a Cray T90 uniprocessor, he said.

Vildibill said Tera also boasts a single shared memory with very high-speed memory access. Tera's MTA is not a vector computer, but the results are comparable, he said. "We'd like to scale the machine up to many processors to understand whether it's viable as a large supercomputer," he said.

But Tera has been delayed in delivering the four-processor model to SDSC. Pfeiffer said there have been problems with the network connecting the processors. But he added that this style of multithreaded computing inevitably will catch on, and Tera is in a race with companies such as Digital Equipment Corp. to see whose multithreading product comes to market first.

- Charlotte Adams

***

AT A GLANCE

Status: The Energy Department is driv-ing efforts to test workstation clusters as an alternative to traditional supercomputing. DOE scientists say they have set up clusters that achieve processing power equivalent to that of commercial supercomputers for as little as one-tenth the cost.

Issues: Some observers believe the new focus on clusters may move the market away from systems geared toward the specialized needs of high-end users. And scientists still are working out technical challenges, such as how to maintain clusters that mix different types of processors.

Outlook: Excellent. The growing need for processing power among typical business customers promises an expanding market for workstation clusters.

NEXT STORY: Data for Hire