This blog discusses the salient features of Dell EMC Ready Solutions for HPC BeeGFS Storage which was announced recently. This is the third blog in the series regarding the BeeGFS High Performance Storage Solution. The first blog announced the release of the solution. The second blog discussed the Scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage. It provided details regarding the base configurations, the flexible scalable configurations and the actual measured sequential read/write performance of the various configurations demonstrating that scalability is linear with respect to the number of servers in the solution. The current blog will highlight the use of "StorageBench", the built-in storage targets benchmark of BeeGFS.
BeeGFS is an open-source file system which can be downloaded from www.beegfs.io. It is a parallel file system which distributes data across multiple storage targets. It is a software defined storage that decouples the logical file system from the underlying storage hardware, allowing the user to define how and where the data is stored. The file system software includes enterprise features such as High Availability, Quota enforcement and Access Control Lists. The key features of BeeGFS are its ease of use, scalability and flexibility. Its ease of use stems from the fact that all the server-side components are user-space daemons, while the client is a kernel module that does not require any patches to the kernel itself. All BeeGFS components can be installed and updated without rebooting the server. So we can add clients and servers to the existing system without any downtime. By adding servers and drives the performance and capacity of the filesystem can be scaled up more in the blog linked here. BeeGFS supports multiple Linux distributions and it is designed to work with any POSIX compliant local file system. BeeGFS also supports running multiple instances of a given service on the same server.
The Dell EMC Ready Solutions for HPC BeeGFS Storage leverages all the key features of the BeeGFS file system and is engineered for high Performance. The solution uses PowerEdge R740xd severs for storing and serving/processing metadata and data. Each PowerEdge R740xd server has 24x 1.6 TB Intel P4600 NVMe SSDs that are considered as the second major leap in drive technology, the SSDs being the first. In HPC environments, the scratch space can often be a limiting factor. It may be too small or too slow. The Dell EMC Ready Solutions for HPC BeeGFS Storage is designed to be used as a scratch solution and serves the scratch storage using the BeeGFS file system.
BeeGFS includes two built-in benchmarking tools which can help characterize or evaluate network or storage NetBench and StorageBench respectively. When NetBench mode is enabled, the servers will discard received write requests instead of writing the data. Similarly, in case of read requests, instead of reading from the underlying file system, only the memory buffers will be sent to the clients. The NetBench mode is intended for testing the network streaming throughput independent of the underlying disks. On the other hand, the StorageBench is intended to measure the streaming throughput of the underlying file system independent of the network performance. StorageBench is a storage targets benchmark that does not use the network. The storage-bench command simply sends the request to the storage targets to start writing/reading data. In doing so, we eliminate the impact of the network. The output we get from storage bench is the best performance that the system can achieve if the network performance is ideal. This blog illustrates how StorageBench can be used to compare the performance of different storage targets and thus identify defective or misconfigured targets.
StorageBench does not use the mounted filesystem. When we run StorageBench, there is only one target per file. Storagebench creates a directory on every storage target on the system where test files are created equal to the number of testing threads. Data is streamed directly to this to show low level throughput available for each storage target. Without any network communication, file striping cannot be simulated. So, the storage benchmark results are rather comparable to client IO with striping disabled. When actual benchmarks are run, the file gets striped across 4 storage targets if the default striping pattern is adopted.
For the purpose of testing different storage targets, the small and medium configurations described in the blog on the Scalability of Dell EMC BeeGFS Storage Solution were used. Both the configurations have the same number of metadata targets configured in RAID 1. They differ in the RAID configuration of the storage targets. While the small setup has the storage, targets configured in RAID 0 of 6 drives, the medium configuration has the storage targets configured in RAID 10 of 6 drives. The storage targets configured on the Small and Medium setup are tabulated below:
Table 1 Testbed Configuration | ||
---|---|---|
Configuration | Medium - RAID 10 for Storage Targets | Small - RAID 0 for Storage Targets |
Number of metadata targets | 6 | 6 |
Number of instance of metadata service | 6 | 6 |
Number of Storage Servers | 5 | 2 |
Number of Storage Targets | 22 | 10 |
Number of storage service per server | 4 | 4 |
Number of storage service per NUMA zone | 2 | 2 |
Number of targets per instance of storage service | 2 | 2 |
Note: The above configuration of the medium setup is only for the purpose of testing the throughput of storage targets configured in different RAID configurations using the StorageBench tool.
The storage benchmark is started and monitored with the beegfs-ctl tool. The beegfs-utils package provides the beegfs-ctl command line tool which can be used to run the storage targets benchmark. The following example starts a write benchmark on all targets of all BeeGFS storage servers with an IO block size of 512KB, using 16 threads per target, each of which will write 200Gb of data to its own file.
[root@stor1 ~]# beegfs-ctl --storagebench --alltargets --write --blocksize=512K --size=200G --threads=16
Write storage benchmark was started.
You can query the status with the --status argument of beegfs-ctl.
Server benchmark status:
Running: 10
The "Running: 10" output indicates that there are in total 10 storage targets configured in the system.
To query the benchmark status/results of all targets, the following command can be executed:
[root@stor1 ~]# beegfs-ctl --storagebench --alltargets --status
Server benchmark status:
Finished: 10
Write benchmark results:
Min throughput: 4692435 KiB/s nodeID: stor1-numa0-2 [ID: 6], targetID: 50
Max throughput: 5368537 KiB/s nodeID: meta-stor-numa1-2 [ID: 2], targetID: 48
Avg throughput: 4907091 KiB/s
Aggregate throughput: 49070915 KiB/s
Adding verbose to the above command, shows the list of all targets and their respective throughput.
[root@meta-stor ~]# beegfs-ctl --storagebench --alltargets --status --verbose
Server benchmark status:
Finished: 10
Write benchmark results:
Min throughput: 4692435 KiB/s nodeID: stor1-numa0-2 [ID: 6], targetID: 6
Max throughput: 5368537 KiB/s nodeID: meta-stor-numa1-2 [ID: 2], targetID: 2
Avg throughput: 4907091 KiB/s
Aggregate throughput: 49070915 KiB/s
List of all targets:
1 5368477 KiB/s nodeID: meta-stor-numa1-1 [ID: 1]
2 5368537 KiB/s nodeID: meta-stor-numa1-2 [ID: 2]
3 4706368 KiB/s nodeID: stor1-numa0-1 [ID: 3]
4 4896077 KiB/s nodeID: stor1-numa1-1 [ID: 4]
5 4872876 KiB/s nodeID: stor1-numa1-2 [ID: 5]
6 4692435 KiB/s nodeID: stor1-numa0-2 [ID: 6]
7 4879054 KiB/s nodeID: stor2-numa1-2 [ID: 7]
8 4864737 KiB/s nodeID: stor2-numa1-1 [ID: 8]
9 4696152 KiB/s nodeID: stor2-numa0-1 [ID: 9]
10 4726202 KiB/s nodeID: stor2-numa0-2 [ID: 10]
The average throughput per storage target configured in RAID 0 is 5.02 GB/s.
The following example starts a write benchmark on all targets of all BeeGFS storage servers with an IO block size of 512KB, using 16 threads per target, each of which will write 200Gb of data to its own file.
[root@node001 ~]# beegfs-ctl --storagebench --alltargets --write --blocksize=512K --size=200G --threads=16
Write storage benchmark was started.
You can query the status with the --status argument of beegfs-ctl.
Server benchmark status:
Running: 22
Adding verbose to the above command, shows the list of all targets and their respective throughput.
[root@node001 ~]# beegfs-ctl --storagebench --alltargets --status --verbose
Server benchmark status:
Finished: 22
Write benchmark results:
Min throughput: 2705987 KiB/s nodeID: node006-numa0-1 [ID: 19], targetID: 1
Max throughput: 3364311 KiB/s nodeID: node001-numa1-1 [ID: 1], targetID: 1
Avg throughput: 3212845 KiB/s
Aggregate throughput: 70682603 KiB/sList of all targets:
1 3364311 KiB/s nodeID: node001-numa1-1 [ID: 1]
2 3361591 KiB/s nodeID: node001-numa1-2 [ID: 2]
3 3309530 KiB/s nodeID: node002-numa0-1 [ID: 3]
4 3312840 KiB/s nodeID: node002-numa0-2 [ID: 4]
5 3332095 KiB/s nodeID: node002-numa1-1 [ID: 5]
6 3323319 KiB/s nodeID: node002-numa1-2 [ID: 6]
7 3313000 KiB/s nodeID: node003-numa0-1 [ID: 7]
8 3321214 KiB/s nodeID: node003-numa0-2 [ID: 8]
9 3335072 KiB/s nodeID: node003-numa1-1 [ID: 9]
10 3339743 KiB/s nodeID: node003-numa1-2 [ID: 10]
11 3302175 KiB/s nodeID: node004-numa0-1 [ID: 11]
12 3309474 KiB/s nodeID: node004-numa0-2 [ID: 12]
13 3329879 KiB/s nodeID: node004-numa1-1 [ID: 13]
14 3328291 KiB/s nodeID: node004-numa1-2 [ID: 14]
15 3306132 KiB/s nodeID: node005-numa0-1 [ID: 15]
16 3307096 KiB/s nodeID: node005-numa0-2 [ID: 16]
17 3318436 KiB/s nodeID: node005-numa1-1 [ID: 17]
18 3329684 KiB/s nodeID: node005-numa1-2 [ID: 18]
19 2705987 KiB/s nodeID: node006-numa0-1 [ID: 19]
20 2716438 KiB/s nodeID: node006-numa0-2 [ID: 20]
21 2707970 KiB/s nodeID: node006-numa1-1 [ID: 21]
22 2708326 KiB/s nodeID: node006-numa1-2 [ID: 22]
The average throughput per storage target is 3.29 GB/s.
From the output of the StorageBench benchmark tests done on two different BeeGFS setups, one with Storage Targets configured in RAID 0 and another with Storage Targets configured in RAID 10, it is evident that the write performance is better with the storage targets configured in RAID 0 rather than in RAID 10. When dd command was used to write a 10G file with 1M block size and "oflag=direct" the average was about 5.1 GB/s for the small system configured in RAID 0 while the average throughput was about 3.4 GB/s for the medium system configured in RAID 10, which is comparable with the results obtained using the storagebench tool.
[root@node001 ~]# beegfs-ctl --storagebench --alltargets --status --verbose
Server benchmark status:
Finished: 33
Read benchmark results:
Min throughput: 2830479 KiB/s nodeID: node003-numa1-2 [ID: 14], targetID: 14
Max throughput: 3025500 KiB/s nodeID: node005-numa0-1 [ID: 22], targetID: 22
Avg throughput: 2917836 KiB/s
Aggregate throughput: 96288596 KiB/s
List of all targets:
1 2950039 KiB/s nodeID: node001-numa1-1 [ID: 1]
2 2956121 KiB/s nodeID: node001-numa1-2 [ID: 2]
3 2954473 KiB/s nodeID: node001-numa1-3 [ID: 3]
4 2957658 KiB/s nodeID: node002-numa0-1 [ID: 4]
5 2947109 KiB/s nodeID: node002-numa0-2 [ID: 5]
6 2969886 KiB/s nodeID: node002-numa0-3 [ID: 6]
7 2892578 KiB/s nodeID: node002-numa1-1 [ID: 7]
8 2886899 KiB/s nodeID: node002-numa1-2 [ID: 8]
9 2888972 KiB/s nodeID: node002-numa1-3 [ID: 9]
10 2861995 KiB/s nodeID: node003-numa0-1 [ID: 10]
11 2874314 KiB/s nodeID: node003-numa0-2 [ID: 11]
12 2879096 KiB/s nodeID: node003-numa0-3 [ID: 12]
13 2832635 KiB/s nodeID: node003-numa1-1 [ID: 13]
14 2830479 KiB/s nodeID: node003-numa1-2 [ID: 14]
15 2830971 KiB/s nodeID: node003-numa1-3 [ID: 15]
16 2986890 KiB/s nodeID: node004-numa0-1 [ID: 16]
17 2979197 KiB/s nodeID: node004-numa0-2 [ID: 17]
18 2983958 KiB/s nodeID: node004-numa0-3 [ID: 18]
19 2897974 KiB/s nodeID: node004-numa1-1 [ID: 19]
20 2900880 KiB/s nodeID: node004-numa1-2 [ID: 20]
21 2904036 KiB/s nodeID: node004-numa1-3 [ID: 21]
22 3025500 KiB/s nodeID: node005-numa0-1 [ID: 22]
23 3021558 KiB/s nodeID: node005-numa0-2 [ID: 23]
24 3017387 KiB/s nodeID: node005-numa0-3 [ID: 24]
25 2921480 KiB/s nodeID: node005-numa1-1 [ID: 25]
26 2930226 KiB/s nodeID: node005-numa1-2 [ID: 26]
27 2930548 KiB/s nodeID: node005-numa1-3 [ID: 27]
28 2900018 KiB/s nodeID: node006-numa0-1 [ID: 28]
29 2898520 KiB/s nodeID: node006-numa0-2 [ID: 29]
30 2907113 KiB/s nodeID: node006-numa0-3 [ID: 30]
31 2855079 KiB/s nodeID: node006-numa1-1 [ID: 31]
32 2853527 KiB/s nodeID: node006-numa1-2 [ID: 32]
33 2861480 KiB/s nodeID: node006-numa1-3 [ID: 33]
From the above output, it is evident that all storage targets perform uniformly and there are no defective targets in the system.