Scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage

Scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage


Article written by Nirmala Sundararajan of Dell EMC HPC and AI Innovation Lab in November 2019

Table of Contents

  1. Introduction
  2. Base Configurations
  3. BeeGFS Usable Space Calculation
  4. Scalable Configurations
  5. Performance Characterization
  6. Conclusion and Future Work

Introduction

This blog discusses the scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage which was announced recently. BeeGFS architecture consists of four main services, the management service, metadata service, storage service and client service. It is possible to run any combination of these four main services, including all of them, on the same server, because the roles and the hardware are not tightly integrated in the case of BeeGFS. In a "Hyper Converged Solution", all four services run on the same server. This configuration is not recommended for performance critical environments because client applications usually consume resources which may impact the performance of the storage services. The Dell EMC solution uses dedicated storage servers and a dual purpose metadata and storage server to provide a high-performance, scalable storage solution. It is possible to scale the system by adding additional storage servers to an existing system. In this blog, we will present configurations with different numbers of storage servers and the performance that can be expected with these configurations.

Base Configurations

The BeeGFS Storage Solution, which is designed to provide a high-performance scratch file system, uses the following hardware components:

  • Management Server
    • R640, Dual Intel Xeon Gold 5218 2.3GHz, 16 cores, 96GB (12x 8GB 2666 MT/s RDIMMs), 6 x 15k RPM 300GB SAS, H740P
  • Metadata and Storage Servers
    • R740xd, 2x Intel Xeon Platinum 8268 CPU @ 2.90GHz, 24 cores, 384GB (12x 32GB 2933 MT/s RDIMMs)
    • BOSS card with 2x 240GB M.2 SATA SSDs in RAID 1 for OS
    • 24x, Intel 1.6TB, NVMe, Mixed Use Express Flash, 2.5 SFF Drives, Software RAID

The management server runs the BeeGFS monitoring service. The metadata server utilizes the 12 drives on NUMA 0 zone to host the MetaData Targets (MDTs), while the remaining 12 drives on NUMA 1 zone, host the Storage Targets (STs). A dedicated metadata server is not used because the storage capacity requirements for BeeGFS metadata are very small. The metadata and storage targets and services are isolated on separate NUMA nodes so that a considerable separation of workloads is established. The storage servers used in the configuration have three storage services running per NUMA zone, six total per server. For more details, please refer to the announcement blog. Figure 1 shows the two base configurations that have been tested and validated at the Dell EMC HPC and AI Innovation Lab.

Figure 1: Base Configurations

The small configuration consists of three R740xd servers. It has a total of 15 storage targets. The medium configuration has 6xR740xd servers and has a total of 33 storage targets. The user can start with a "Small" configuration or with the "Medium" configuration and can add storage or metadata servers as needed to increase storage space and overall performance, or number of files and metadata performance, respectively. Table 1 shows the performance data for the base configurations which have been tested and validated extensively at the Dell EMC HPC and AI Innovation Lab.

Base Configuration Small Medium
Total U (MDS+SS) 6U 12U
# of Dedicated Storage Servers 2 5
# of NVMe Drives for data storage 60 132
Estimated Usable Space 1.6 TB 86 TiB 190 TiB
3.2 TB 173 TiB 380 TiB
6.4 TB 346 TiB 761 TiB
Peak Sequential Read 60.1 GB/s 132.4 GB/s
Peak Sequential Write 57.7 GB/s 120.7 GB/s
Random Read 1.80 Million IOPS 3.54 Million IOPS
Random Write 1.84 Million IOPS 3.59 Million IOPS

Table 1: Capacity and Performance Details of Base Configurations


BeeGFS Usable Space Calculation

Estimated usable space is calculated in TiB (since most tools show usable space in binary units) using the following formula:


BeeGFS Usable Space in TiB= (0.99* # of Drives* size in TB * (10^12/2^40)

In the above formula, 0.99 is the factor arrived at by assuming conservatively that there is a 1% overhead from the file system. For arriving at the number of drives for storage, 12 drives from the MDS are also included. This is because, in the MDS, the 12 drives in NUMA zone 0 are used for metadata and the 12 drives in the NUMA zone 1 are used for storage. The last factor in the formula 10^12/2^40 is to convert the usable space from TB to TiB.

Scalable Configurations

The BeeGFS High Performance Storage Solution has been designed to be flexible and one can easily and seamlessly scale performance and/or capacity by adding additional servers as shown below:
Figure 2: Scaled Configuration Examples

The metadata portion of the stack remains the same for all the above configurations described in this blog. This is because the storage capacity requirements for BeeGFS metadata are typically 0.5% to 1% of the total storage capacity. However, it really depends on the number of directories and files in the file system. As a rule of thumb, the user can add an additional metadata server when the percentage of metadata capacity to the storage falls below 1%. Table 2 shows the performance data for the different flexible configurations of the BeeGFS Storage Solution.

Configuration Small Small +1 Small +2 Medium Medium +1
Total U (MDS+SS) 6U 8U 10U 12U 14U
# of Dedicated Storage Servers 2 3 4 5 6
# of NVMe Drives for data storage 60 84 108 132 156
Estimated Usable Space 1.6 TB 86 TiB 121 TiB 156 TiB 190 TiB 225 TiB
3.2 TB 173 TiB 242 TiB 311 TiB 380 TiB 449 TiB
6.4 TB 346 TiB 484 TiB 622 TiB 761 TiB 898 TiB
Peak Sequential Read 60.1 GB/s 83.3 GB/s 105.2 GB/s 132.4 GB/s 152.9 GB/s
Peak Sequential Write 57.7 GB/s 80.3 GB/s 99.8 GB/s 120.7 GB/s 139.9 GB/s

Table 2:Capacity and Performance Details of Scaled Configurations


Performance Characterization

The performance of the various configurations was tested by creating storage pools. The small configuration has 15 storage targets and each additional storage server adds an additional six storage targets. So, for the purpose of testing the performance of the various configurations, storage pools were created from 15 to 39 storage targets (increments of six for small, small+1, small+2, medium, medium+1). For each of those pools, three iterations of iozone benchmark were run, each with one to 1024 threads (in powers of two increments). The testing methodology adopted is the same as that described in the announcement blog. Figures 3 and 4 show the write and read performance of the scalable configurations respectively, with the peak performance of each of the configuration highlighted for ready reference:


Sequential Write Performance of Base and Scaled Configurations
Figure 3: Write Performance of Scalable Configurations

Sequential Read Performance of Base and Scaled Configurations
Figure 4 : Read Performance of Scalable Configurations

Note:

The storage pool referred to were created only for the explicit purpose of characterizing the performance of different configurations. While doing the performance evaluation of the medium configuration detailed in the announcement blog, all the 33 targets were in the "Default Pool" only. The output of the beegfs-ctl --liststoragepools command given below shows the assignment of the storage targets:

# beegfs-ctl --liststoragepools
Pool ID Pool Description Targets Buddy Groups
======= ================== ============================ ============================
1 Default 1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,
22,23,24,25,26,27,28,29,30,
31,32,33


Conclusion and Future Work

This blog discussed the scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage and highlighted the performance for sequential read and write throughput, for various configurations. Stay tuned for Part 3 of this blog series that will discuss discuss additional features of BeeGFS and will highlight the use of "StorageBench", the built-in storage targets benchmark of BeeGFS. As part of the next steps, we will be publishing a white paper later with the metadata performance, IOR N-1 performance evaluation and with additional details about design considerations, tuning and configuration.


References

[1] Dell EMC Ready Solutions for HPC BeeGFS Storage: https://www.dell.com/support/article/sln319381/
[2] BeeGFS Documentation: https://www.beegfs.io/wiki/
[3] How to connect two interfaces on the same subnet: https://access.redhat.com/solutions/30564
[4] PCI Express Direct Memory Access Reference Design using External Memory: https://www.intel.com/content/www/us/en/programmable/documentation/nik1412547570040.html#nik1412547565760



Quick Tips content is self-published by the Dell Support Professionals who resolve issues daily. In order to achieve a speedy publication, Quick Tips may represent only partial solutions or work-arounds that are still in development or pending further proof of successfully resolving an issue. As such Quick Tips have not been reviewed, validated or approved by Dell and should be used with appropriate caution. Dell shall not be liable for any loss, including but not limited to loss of data, loss of profit or loss of revenue, which customers may incur by following any procedure or advice set out in the Quick Tips.

Article ID: SLN319382

Last Date Modified: 12/08/2019 08:36 PM


Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.