High-Performance Clusters:Out-of-Band Management Approaches
By Yung-Chin Fang, Shukri Zaibak, Monica Kashyap, and Jenwei Hsieh, Ph.D. (Issue 4 2001)
This article describes the primary approaches to the out-of-band management of large numbers of nodes in a high-performance computing cluster. It focuses on different types of physical devices available and assesses their suitability for various environments.
Deployment of a large cluster is typically done in two phases. The first phase requires building a scaled-down cluster to test compatibility and performance of the various layers. The second phase involves building a production-level cluster based on the lessons learned in the first phase. The tiered approach introduces a discontinuity in the management, scalability, installation, and configuration of the two clusters. This article focuses on the management issues for a cluster that contains a large number of nodes.
Because of availability, most trial clusters intuitively start with shared network management, also called in-band management , in which applications and management tasks share the same intra-cluster network such as Fast Ethernet. This article does not discuss the widely adopted in-band management; however, it targets four out-of-band management approaches.
Four choices for high-performance computing (HPC) cluster out-of-band management include a serial port concentrator, analog Keyboard Video Mouse (KVM) switch, advanced KVM switch, and digital KVM switch. These are called out-of-band methods because they are separate from the intra-cluster interconnect topology.
This article identifies the selection criteria, describes the out- of-band management devices and connections, and compares each approach.
Reasons for out-of-band management
Several reasons for selecting out-of-band management include:
Parallel management. Out-of-band management provides a valuable broadcast function that enables parallel management. Broadcast functions, in particular, facilitate remote installation, updates, and upgrades to the operating system and any packages on all compute nodes. System administrators can use the broadcast function to perform operations on all compute nodes: remotely setting the BIOS boot sequence to PXE boot;1 editing, copying, and deleting files on all compute nodes in parallel; and rebooting all nodes in the cluster.
Fault detection and recovery. When the intra-cluster network path to a compute node fails, out-of-band management can isolate the cause, recover the failure, or reduce the impact of a network failure.
Network traffic reduction. Management-related network traffic is moved to the out-of-band route, so that users' applications can fully utilize the network bandwidth.
Security risk reduction. All out-of-band remote management approaches use customized embedded operating systems. These proprietary operating systems provide security features and special-purpose graphical user interfaces (GUIs) or command sets, which reduce the risks of viruses and unauthorized access.
Since many types of out-of-band management devices are available, it is necessary to study and understand the selection criteria and other factors to reduce potential scale-out risks.
Selection criteria for an HPC cluster configuration
Since HPC cluster configurations vary, a cluster architect should investigate the requirements and consider HPC cluster management selection criteria to prevent potential problems. Dell proposes considering the following selection criteria for various out-of-band management devices.
Remote accessibility and manageability. A remote site can connect to a management device through a modem or network. Low-density management devices require many Internet Protocol (IP) addresses and network ports for a large cluster. These extra IPs can introduce unexpected management issues. The ability to manage and operate a cluster remotely provides various features such as remote diagnostics and joint debugging. Several factors to consider include bandwidth, number of remote login sessions, interface usability, port density, and security level.
Runtime environment. Several areas to consider in the runtime environment include operating environment, application characteristics, display mode/resolution/frame-rate support, and bandwidth.
Setup and configuration. Some out-of-band devices have complicated operating environments and require time to learn and set up the device correctly. Although some devices are easy to set up, they may require setting up the BIOS and configuring the operating system for all compute nodes; generally the setup effort is in direct proportion to the cluster size for these types of devices. Other devices require very little effort to set up or configure.
Maintenance and support. Some effort may be required to upgrade the technology and to reconfigure the management device operating environment. Several out-of-band management devices require reconfiguration after flashing the embedded operating system. Some devices are dependent on the compute node's operating system for certain functions. This type of device might require reconfiguring all of the compute nodes so they work with the updated or upgraded operating system.
Operating environment of the user community. The administrator learning curve for the embedded operating environment has two phases: setup and installation, and operation and maintenance. Some devices require a long learning curve for both phases; others require a short learning curve for setting up the device/cluster and a long learning curve for maintenance. Alternatively, the learning curve for some devices is short for both setup and maintenance. This task occurs when the embedded operating environment is flashed.
Functionality. Several functionality-related factors to consider are the maximum number of sessions, broadcast capability, user interface design, ability to partition, error handling, ease of diagnostics, and concurrent monitoring of a cluster node.
Cabling management and configuration. Cabling management considerations include configuration flexibility, rack density, cable length availability, cable weight, cable thickness, connector reliability, and rewiring feasibility. The cluster size affects cabling management. To add, retire, or move a rack because of the cluster life cycle or mission is common. Generally it is easy to reconfigure or partition these management routes and devices to form a new configuration.
Centralized capability. Device scalability is an important issue. For example, some devices can manage 256 nodes or more, while others can scale up to only 16 ports per device. Low-density devices can bring down the centralized management integrity, whereas high-density devices require a well-designed GUI to reduce keystrokes for node identification.
Rack space. Rack space optimization can affect costs. For example, some devices are in 1U form factor with 16 ports per device and can be 0U mount (mounted on the side of a rack). Others have higher port density, but are in a 3U or larger form factor. Adopting large form factors of management devices can lead to extra racks.
The next sections discuss the configuration topology and cabling for the following out-of-band management devices: serial port concentrator, analog KVM switch, digital KVM switch, and advanced KVM switch.
Serial port concentrators are communication servers
Serial concentrators are also communication servers or Telnet servers. Most serial concentrators have a CPU (for example, MIPS3000 or Intel® 80286), embedded operating system, memory for running the embedded operating system (OS), non-volatile RAM (NVRAM) to record the settings and configuration, one or two network ports for remote access, and many RS-232 ports to connect to compute nodes. In addition to cluster management, serial concentrators are widely used for managing communication devices such as the modem pool, PBX, network switch, router, and other devices.
The communication standard currently used between the serial port concentrators and the compute nodes is EIA RS-232C. Most concentrators use Category-5 (Cat-5) cables to replace the thick RS-232 full specification serial cables to reduce the thickness that often causes cabling management difficulties.
The Cat-5 cable uses a DB-9 (9-pin) connector that connects to the compute node and a RJ-45 connector that connects to a serial port concentrator. This type of special cable has already become the de facto standard for serial port concentrators. Another implementation uses a dongle that connects the server to the serial concentrator using a Cat-5 cable.
Four ways to connect a master node/management console to a serial port concentrator include (see Figure 1 ):
Figure 1. Serial port concentrator solution
- Null modem cable
- Modem
- Network switch
- Crossover cable
|
System administrators can use a terminal emulator to log on to the management console through the serial port concentrator, then Telnet/Point-to-Point Protocol/Secure Shell (Telnet/PPP/SSH) to any of the compute nodes to run the management task. Because of the low data-transfer rate of the RS-232, serial port concentrators are generally used for text-mode sessions. Compute nodes must support BIOS- and OS-level console redirection features to fully utilize a serial port concentrator.
KVM switches provide single point of access
KVM (Keyboard Video Mouse) switches enable HPC clusters to have a single point of access to all servers in a cluster. If the compute nodes need video capabilities, a KVM switch is a suitable option. A compute node can be connected to a KVM switch input port via a KVM cable, which consists of keyboard, mouse, and video cables. Three types of KVM switches include:
- Analog KVM switch
- Digital KVM switch, which can cascade with traditional analog KVM switches
- Advanced KVM switch using Cat-5 cables
|
Analog KVM switches are transparent to BIOS/OS
Analog KVM switches, considered plug-and-play devices, are easy to configure since they require no software installation.
An analog KVM switch can have 2/4/8/16/24/32/64 input ports connected to all compute nodes and have 1/2/4 output ports connected to a master (management) node. System administrators can manage any of the nodes as if they were connected to the node directly without BIOS settings or OS configuration. They also can have access to both text and graphics modes.
Analog KVM switches are transparent to both the BIOS and the operating system, and they can be cascaded; for example, a switch with 64 ports would provide access up to 4,096 nodes (64x64) by connecting 64 KVM switches to one KVM switch. In this example, each of the 64 switches has 64 compute nodes connected to it, as shown in Figure 2 .
Figure 2. Analog KVM switch configuration
One disadvantage of using an analog KVM switch in an HPC cluster is the lack of support for remote access, since the accessibility is limited to the length of the KVM cable. Digital KVM switches are derived from analog KVM switches to overcome the accessibility requirement.
Digital KVM switches offer remote accessibility
Digital KVM switches solve remote accessibility requirements. The digital KVM is also called KVM over IPTM . A digital KVM switch has a CPU, an embedded operating system, some RAM for the OS, some NVRAM for settings/configuration, one network port, and many KVM ports.
The connection topology is similar for digital and analog KVM switches. The primary difference is that the top-layer switch of the digital KVM switch is digital rather than analog. The network port of a digital KVM switch connects to the public network.
The KVM ports connect to either an analog KVM switch or a compute node. In a two-layer cascaded configuration (see Figure 3 ), a compute node connects to an analog KVM switch via a KVM cable; the analog KVM switch connects to a digital KVM switch via a custom KVM cable (smaller form factor, which integrates keyboard/video/mouse cables into one cable).
Figure 3. Digital KVM switch configuration
The digital KVM switch connects to the management node via Ethernet. Because of bandwidth, the digital KVM switch converts the analog KVM signal to a digital signal, then compresses the digital signal to reduce potential traffic. The result is that multiple remote sessions can display 1280x1024x85 Hz graphics screens with acceptable latency and network traffic. This solution can eliminate the number of on-site administrators for immediate cluster support. Digital KVM switches also support secured connections over the network.
Advanced KVM switch uses Cat-5 cable
The advanced KVM switch uses a Cat-5 cable and a dongle, which are paired to replace a traditional thick KVM cable. The keyboard/video/ mouse port of a compute node connects to the input end of the dongle, while the output end connects to a network connector.
The connector attaches to a Cat-5 network cable with an RJ-45 connector, which connects to the advanced KVM switch. For high-density configuration, the dongle size can cause management inconvenience; if it is improperly installed, the dongle can cause potential cooling problems. The advanced KVM switch connects to a digital console box, and the box connects to a keyboard, a monitor, and a mouse directly.
The digital console box, which replaces a management computer, serves as the management node (see Figure 4 ). Rack space is a confined resource for a high-density HPC cluster implementation. The Cat-5 cable and dongle pair help to optimize cabling management and rack space.
Figure 4. Advanced KVM switch
Comparing management devices
Figure 5 compares the selection criteria and shows the most frequently used features among the four types of out-of-band management devices.
Figure 5. Comparison of out-of-band management devices
The analog KVM switch using tree topology is effective because it scales well and requires the least amount of setup, learning, and maintenance. The drawback is the lack of remote accessibility.
When frequently adding and moving racks, cable management becomes a concern. In this case, both serial port concentrators and advanced KVM switches are suitable since both use thin Cat-5 cables. Serial port concentrators require more effort than the advanced KVM switch for setting up BIOS and configuring the OS on all compute nodes.
Both serial concentrators and digital KVM switches satisfy the need for remote accessibility. Serial concentrators can manage text-mode applications smoothly. When the system requires remote high-resolution/frame-rate graphics screen monitoring, then the digital KVM switch is the proper choice.
Selecting the proper management tool for HPC cluster monitoring when management is crucial can facilitate all the management tasks, reduce cluster downtime, and further make the cluster easy to use, maintain, and expand.
Future trends for cluster management
The HPC cluster adoption model begins with creating trial HPC clusters, then implementing and expanding to larger production clusters. In the last few years, the number of HPC commodity systems on the TOP500 Supercomputer Sites list has steadily increased. In 1999 and 2000, only a few clusters made the list. In 2001, more than a dozen commodity clusters gained significant positions on the list. As the scale of these supercomputing clusters continues to grow, out-of-band management becomes necessary.
When an HPC cluster scales out to a certain degree, scalability, manageability, and usability of out-of-band management devices become critical to the cluster's success. To meet the fast-growing scale-out trend, the out-of-band management devices trend is to enhance scalability, manageability, and usability; provide flexible reconfiguration; support compressed high-bandwidth remote management capability; and move from add-on devices to integrated solutions to reduce the installation and management effort.
Yung-Chin Fang (Yung-Chin_fang@dell.com) is a member of the Scalable Systems Group at Dell. Yung-Chin has a M.S. degree in Computer Science from Utah State University. He is currently working on his Ph.D. in Computer Science at the University of Houston.
Shukri Zaibak (shukri_zaibak@dell.com) is a systems engineer in the Scalable Systems Group at Dell. He focuses on solving high availability and high-performance computing problems in Linux® . Previously Shukri performed a wide range of functions in system management and network administration. Shukri has a B.S. in Electrical Engineering from the University of South Alabama.
Monica Kashyap (monica_kashyap@dell.com) is a systems engineer in the Scalable Systems Group at Dell. She has a B.S. in Applied Science and Computer Engineering from the University of North Carolina at Chapel Hill.
Jenwei Hsieh, Ph.D. (jenwei_hsieh@dell.com) is a member of the Scalable Systems Group at Dell. Jenwei is responsible for developing high-performance clusters. He has published more than 30 technical papers in the areas of multimedia computing and communications, high-speed networking, serial storage interfaces, and distributed network computing. Jenwei has a Ph.D. in Computer Science from the University of Minnesota and a B.E. from Tamkang University in Taiwan.