Cluster Computing has revolutionized High Performance Computing (HPC). Prior to clusters, HPC was dominated by large centralized systems with names such as CDC
, and IBM
. All of the users shared these machines as centralized resources, reducing costs by centralizing expense resources. However, what researchers found was that these large systems could not keep up with the computational demand. This finding meant that the effective amount of HPC time that researchers received continually declined even as the machines got faster. Moreover, every time a new machine came out, people were forced to rewrite their code to match the architecture of that machine. However, during this time, PCs quietly increased in performance at an ever-faster rate than HPC.
For the reasons already stated, in 1993 Don Becker and Tom Sterling decided
to investigate combining PCs with open source software to create what you might call a "personal" HPC system that was an alternative to the big, centralized HPC systems. The idea is to put HPC-class computing power in the hands of the individual user. Their first system, using 16 486DX4100 boards, channel-bonded 10 MB/s Ethernet, and Linux, proved to be a huge leap in price/performance compared to HPC systems of the time. With the "Beowulf" cluster, as they chose to call their cluster, the individual researcher now had HPC levels of performance at their fingertips, but at a much lower price. They could use the centralized HPC systems for larger jobs if they wanted, but the Beowulf cluster could be used for all of the traditional HPC applications as well. From this point the use of clusters exploded. In June 2008, 80 percent of the Top500
fastest systems in the world are classified as clusters.
Why has cluster computing been so successful in HPC? If you ask people in HPC you will receive a variety of answers. The general comments I hear are:
- Price/performance: Cluster computing delivers orders of magnitude better price/performance than the old centralized HPC resources.
- Individual HPC resource: This was one of the original goals of the Beowulf clusters—to put more computational power in the hands of the individual.
- Machine adaptability: You can design the machine to your code without being forced to redesign your code every time a new HPC system was produced.
- Open source: The classic Beowulf cluster was built on Linux and other open source tools, which means that the researchers did not have to invest money into purchasing the software and could also modify the tools as needed to better solve their problems.
- Open standards: People don’t often think of this reason, but using systems with open standards such as x86 processors, Ethernet, and other networking protocols such as InfiniBand, MPI, and so on, are a key to the success of cluster computing. With open standards you can write your application to those standards and run that code on any system adhering to them.
Anatomy of a Cluster
HPC cluster systems can come in many flavors and colors—that’s one of the beauties of clusters, they can be architected to match your application. Figure 1 provides a rough diagram of the classic "Beowulf" cluster.
: Classic Beowulf Cluster
In this layout, the cluster is on a private network
switch, a master node that bridges between the cluster and the outside network. The master node controls the OS imaging and administration of the cluster and also serves as a login node for users to log into the cluster. There are a set of nodes, labeled as compute nodes, that are used just for computation. There is also some kind of shared storage for the cluster that the master node and the compute nodes can access (mount). At a basic level, the storage is typically NFS
Connecting all of these pieces is at least one network and as many as three networks, although two networks are shown for convenience (two networks are typical). There are several types of network traffic within the cluster. The first is traffic devoted to the management of the cluster itself. The second is what is called computational traffic that is devoted to data passing (also called message passing). The third type of network, and one that many people forget, is the storage network. This network carries any kind of storage traffic within the cluster. In Figure 1 the storage traffic can be run over the computational network or the management network, depending on the I/O requirements and the specific details of the storage (for example, hardware and file system).
The minimum number of networks you need is one. You can have all three types of traffic over the same network. In this case the network needs to be TCP (for example, GbE), because the management traffic typically communicates with the Baseboard Management Controller (BMC
) on the compute nodes using IPMI
. You also put the computational traffic and the storage traffic on the same network. This approach can save money, but could become a bottleneck depending on your application because you have any and all network traffic on the same network. In addition, this network has to be TCP based, most likely Gigabit Ethernet (GbE). At this time, 10 Gigabit Ethernet (10GbE) is not a viable solution.
To get around such a bottleneck, people typically put the management traffic on its own TCP network and put the computational traffic on a separate network that could be TCP, or something different such as Myrinet or InfiniBand. Since the computational traffic—which can be an important factor in the performance of the system—is on a dedicated network, it can improve scalability and performance of the application. For larger clusters, many people will use a high-speed network, such as Myrinet 10G or InfiniBand, as the computation network. You can also run the storage traffic over the management network or the computational network to save money. But, there are times when the storage traffic warrants a dedicated storage network. This approach is fairly popular with GbE networks since GbE is an inexpensive cluster network. On the other hand, a high-speed network typically has bandwidth left over from the computational network traffic. This leftover bandwidth can be used for storage traffic, which eliminates the need for a third network for storage, and makes clusters a bit easier (fewer cables).
The previous section presented a schematic for a classic cluster computing system showing the various major components and how they connect. But I want to dive a little deeper into the various parts to show you that they are not any different from any other enterprise solution. The links below are the major components of a cluster. You can click on each of the links to learn more about each component.
Conceptually, clusters consist of just a few components. The "magic" to clusters comes in how you put them together to run HPC applications and how you manage them. (If you want to learn more about how parallel programs work go to <this link>.)
(Future link - "Advanced Topics")
After you read the sections for reach of the links, you can read on about standard HPCC configurations.
Now that you’ve read about the various components to cluster computing systems (you did read those sections didn’t you?), let’s take a look at some standard configurations. The purpose of these configurations is to show you how things are configured.