Multiserver clusters are the new supercomputer stand-in. Here’s how storage has adapted to its important new supporting role
- By John Moore
- May 29, 2006
Storage was straightforward in supercomputing’s monolithic days. Storage subsystems came with supercomputers and were fairly simple to integrate.
Today, high-performance computing often involves clusters of computers. Now 72 percent of the world’s 500 fastest supercomputers are clusters, according to Top500.org, a group that publishes an annual list. The increasingly dominant distributed computing model tests storage in ways centralized supercomputer systems never did.
“It’s infinitely harder to connect storage in a cluster environment,” said Eric Seidman, storage systems manager at Verari Systems, a high-performance computing vendor. The overall objective, Seidman said, is to achieve balanced performance among processors, memory and input/output subsystems, which connect the cluster to storage.
Many vendors now offer storage systems for clusters. Storage experts say, however, that no single solution on the market will fit every supercomputing task. Instead, tiers of solutions are emerging to address the input/output requirements of different high-performance computing environments.
High-end clusters with a heavy input/output demand typically call for features such as parallel file systems and high-speed storage interconnects. But clusters of more modest size may use more familiar storage staples, such as network-attached storage (NAS) systems.
High-performance computing storage differs from the mainstream, enterprise variety in that storage managers need to pursue a holistic approach, said Anne Vincenti, storage director
at Linux Networx, which builds Linux-based supercomputing clusters.
“They need to not only look at the storage but…get into the guts of the system architecture in order to optimize clusters,” Vincenti said.
On the flip side, technology managers who concentrate on the computing side of clusters should also take a broader view, Seidman said. “There’s a lot of focus on the processor type and clock speed…and less focus on the overall architecture,” he added.
The architecture’s storage element has several critical parts, including the storage array, the link between the storage device and the computing cluster, and the file system that sits atop the storage architecture. Altogether, those elements supply data to the scientific and technical applications running on a cluster. Storage systems also house the data that those applications generate.
Customers must tune those various components to maximize performance. Because clusters vary in size and performance and run different applications, they need individualized storage
Lawrence Berkeley National Laboratory’s Scientific Cluster Support program has more than 20 Linux clusters in production. The group “specializes in building individual clusters for specific areas of scientific research,” said Gary Jung, SCS project manager at the lab’s Information Technology Division. “Since we are building [high-performance computing] systems for specific areas of science, we can address the unique storage considerations more directly.”
Storage solutions at SCS and other high-performance computing centers tend to fall into three categories based on cluster size: small, medium and large.
Linux Networx defines the value performance segment as small clusters with as many as 32 nodes. Applications may include crash simulations or smaller computational fluid dynamics jobs, Vincenti said.
Customers in the value camp often can manage with a file server running the standard Network File Server (NFS) protocol or a commercial NAS box, she said. Value customers emphasize manageability and seek solutions “that will deploy and work quickly without having to hack around,” Vincenti said.
Smaller file sizes also characterize the value bracket. Vincenti said smaller files may be 50M or less, while a large file could be bigger than 50G.
At SCS, smaller clusters with modest input/output requirements use some form of Linux-based NFS server with a direct-attached Redundant Array of Independent Disks solution, Jung said. NFS, commonly used in mainstream Unix and Linux environments, provides a distributed file system through which users share files on a network.
Midsize clusters, meanwhile, gravitate toward NAS. The SCS program has deployed Network Appliance’s NAS systems for clusters with moderate input/output requirements, Jung said. He said the systems are reliable and easy to integrate.
The Army Research Laboratory also uses NAS storage. “NAS systems with multiple Gigabit Ethernet connections have been successful for small- to moderate-sized clusters,” said Tom Kendall, chief engineer at the Army Research Laboratory’s Major Shared Resource Center.
The midsize market segment contains clusters with as many as 256 nodes, according to Linux Networx.
Applications in this segment include process integration and design optimization and computational fluid dynamics. Clusters may handle small and large file-streaming applications. Using the latter, scientists can quickly load massive amounts of data into memory for faster execution of jobs.
In the case of top-tier clusters, storage systems aim to address the problem of hundreds
or thousands of data-hungry nodes.
Vincenti said some high-performance computing systems don’t get enough data to take advantage of their computing power. “You have to have enough [input/output] built into the system to keep the [system] fully optimized,” she said.
Large clusters with intensive input/output demands often call for a parallel file system. Parallel file systems are designed for supercomputing tasks in which multiple nodes access the same files simultaneously. In addition, parallel file systems let a cluster’s nodes access data at speeds much faster than what NFS can achieve.
“Parallel file systems are the greatest hope in this area,” Kendall said. The Army lab is deploying clusters with input/output requirements approaching 10 gigabytes/sec. The clusters have “capacities of hundreds of terabytes, which are beyond the capability of nearly all single-server solutions,” he said.
Parallel file systems don’t suit every supercomputing challenge, however.
“There are applications that are by their very nature serial and, therefore, unable to leverage existing parallel file systems beyond the capability” of a single node’s input/output performance, Kendall said.
“Fortunately, only a small percentage of our applications fall into this regime,” he said.
The San Diego Supercomputer Center’s three largest clusters — DataStar, TeraGrid IA-64 and BlueGene — use IBM’s General Parallel File System, said Richard Moore, director of production systems at the center. Similarly, SCS has used Panasas’ ActiveScale Storage Cluster for its larger clusters with intensive input/output requirements, Jung said. ActiveScale Storage Cluster includes Panasas’ parallel file system, PanFS.
Cluster File Systems’ Lustre is another example of a parallel file system. The Lawrence Livermore National Laboratory, the National Center for Supercomputing Applications and Sandia National Laboratories are among the government customers using this open-source file system.
The high-end cluster segment also marks a shift in storage architecture. One approach incorporates the lower-level, block-based data access method of storage-area networks (SANs) alongside the file-based access techniques of NAS. In this approach, a large cluster may reserve a certain number of nodes for input/output. Those nodes feed data to the cluster’s computing nodes. The ratio of input/output nodes to computing nodes varies. The San Diego Supercomputer Center’s BlueGene cluster, for example, operates 1,024 computing nodes and 128 input/output nodes.
Input/output nodes may connect to a cluster’s computing nodes via proprietary interconnects, such as Myrinet and Quadrics, or standard Gigabit Ethernet, storage experts say. Reaching out from clusters to external storage devices, input/output nodes typically use Fibre Channel and attach to storage in a SAN-like manner. Supercomputer SANs may use storage devices from vendors such as DataDirect Networks, Engenio Information Technologies and Verari Systems, which markets Engenio products as an OEM partner.
Vincenti cited interconnect bandwidth and fast and flexible disk controllers as attributes that make SANs a suitable environment for high-performance computing applications such as streaming.
But storage technologists say such storage architectures aren’t strictly SANs. Seidman said input/output nodes are typically NAS-attached to computing nodes and SAN-attached to storage
Some NAS-based approaches can also offer storage to high-end clusters. Vendors such as Panasas offer NAS solutions that let a cluster’s input/output nodes access data in parallel. “We are starting to see a big move toward parallelized NAS,” said Dan Smith, senior enterprise consultant at GTSI.
Panasas’ storage product deploys a small file system — the company’s DirectFlow client — on all the nodes in a Linux cluster. The DirectFlow client lets the nodes access Panasas’ storage devices in parallel.
This parallel NAS solution has a SAN element, however. Panasas uses Internet SCSI over Ethernet as the underlying transport mechanism, said Larry Jones, vice president of marketing at Panasas.
“SAN and NAS get a bit muddy in clusters,” Seidman said.
Buyers have multiple options when it comes to cluster storage. They must weigh factors such as cluster size, applications and throughput requirements before reaching a decision.