*With two PowerEdge C6145s attached to a PowerEdge C410x (Full Sandwich) configuration, the best performance achieved with HPL is 2891 GFLOPS (31% theoretical peak) and it consumes 5030 watts.**On a single PowerEdge C6145 attached to a C410x (Half Sandwich) configuration, the best performance with HPL is 1697 GFLOPS (19% theoretical peak) and it consumes 3802 watts.**The measured GFLOPS per watt show that the C6145 and C410x solution converts power to FLOPS up to 1.7X more efficiently compared a CPU only configuration.**GPGPUs offer a great potential of improving performance of suitable HPC applications.*

As show in the Figure 1, a Power Edge C410x is used with two PowerEdge C6145 hosts. The PowerEdge C410x is an external 3U PCI-e expansion chassis, with a space for 16 GPUs. Compute nodes connect to the C410x via a Host Interface Card (HIC) and an iPASS cable. All connected nodes are mapped to the available GPUs according to a user defined configuration. The exact way the 16 GPUs are allocated can be dynamically reconfigured easily using a web GUI, making the operation easier and faster. Currently, the available GPU to host ratios are 2:1, 4:1 and 8:1. So, a single compute node can access up to 8 GPUs! The design of the

Power Edge C410x | GPGPUs Model | NVIDIA Tesla M2070 |

Number of GPGPUs | 16 | |

iPASS Cables | 8 | |

Mapping | 2:1, 4:1, 8:1 | |

PowerEdge C6145: Compute Node | Processor | 4 Opteron 6132 HE @ 2.2 GHz |

Memory | 128 GB 1333 MHz | |

BIOS | 1.7.0 (4/13/11) | |

BMC FW | 1.02 | |

PIC FW | [0116] | |

OS | RHEL 5.5, (2.6.18-194.e15) | |

CUDA | 4.0 | |

M2070 GPGPU | Number of cores | 448 |

Memory | 6 GB | |

Memory bandwidth | 150 GB/s | |

Peak Performance: Single Precision | 1030 GFLOPS | |

Peak Performance: Double Precision | 515 GFLOPS | |

Benchmark | GPU Enabled HPL from nVIDIA | Version 11 |

**Table 1: **C410x using Configuration

- The inter-nodes connections through the InfiniBand switch should use the MEZZ card, which are installed on IOH 1 and share the bandwidth with GPUs connected on SLOT 3.
- Based on measured bandwidth test, the best bandwidth utilization can be achieved if a single HIC connects to a maximum of two GPUs.
- Using two HIC Cards per compute node is highly recommended with C6145 and C410x solution.
- Due to the NUMA architecture of the C6145, special attention should be given to process to memory mapping. In general, using memory near the GPGPUs gives more performance.
- Single compute node can’t work with more than 12 GPUs due to some system limitation.

As shown in table 1, each M2070 GPGPU has a peak performance of 515 GFLOPs, giving a fully populated C410x with 16 GPUs a peak capacity of 8240 GFLOPs. Similarly, the peak compute capacity of a single C6145 compute node is 281.6 GFLOPs; all four nodes are rated at 1126.4 GFLOPS. The total peak performance of the GPGPU solution as show in figure 1 is 9369 GFLOPs (double precision). Figure 3 shows the improvement in HPL performance due to GPGPU acceleration. As a reference the blue bars show the measured performance with CPUs only. The red bars show performance improvement when a total of 16 GPGPUs are used for acceleration. Two C6145 are attached to the C410x, and the mapping per compute node is set to either 4:1 or 2:1. When all four compute nodes of the C6145 are used with no GPGPUs attached the performance is what? GFLOPS giving an efficiency of 72.1%. By using 4 GPUs/node, the performance increases to 2891.0 GFLOPS, which is 3.6X the performance with only CPUs. For HPL using the maximum number of 16 GPGPUs is beneficial in both cases. However keeping the mapping ratio to 2:1 for HPL gives 1.6X more performance compared to a mapping ratio of 4:1.

Article ID: SLN310992

Last Date Modified: 08/14/2018 04:30 AM

Accurate

Useful

Easy to understand

Comments cannot contain these special characters: <>()\

Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.

Please provide ratings (1-5 stars).

Please provide ratings (1-5 stars).

Please provide ratings (1-5 stars).

Please select whether the article was helpful or not.

Comments cannot contain these special characters: <>()\

characters left.