Article written by Jyothi Bhaskar of HPC and AI Innovation Lab in June 2019
With this blog we announce the availability of Dell Ready Solution for Lustre with Cascade Lake processors. In this blog we present the updated technical specifications of the Lustre solution, initial performance results of the updated solution, and a comparison between the current results and the previous results. We configured the solution stack with new updates as presented in Table 1 with EDR interconnect ,verified that the installation worked as expected and ran performance checks.
The architecture diagram for the Large base configuration is shown below in Figure 1.
Please note that the server and storage models remain the same as presented earlier. Only the new updates are shown in Table 1.
Figure 1 : Dell Ready Solution for HPC Lustre Storage : Architecture diagram of L base configuration
Table 1 : Updated technical specifications of the Ready Solution for Lustre and a quick comparison with the previous release
Hardware/Software Component |
Current |
Previous |
Processors in OSS and MDSObject Storage Server ( OSS) and Metadata Server ( MDS) |
2 x Intel Xeon™ Gold 6230 CPU with 20 cores @ 2.10GHz per OSS/MDS |
2 x Intel Xeon™ Gold 6136 with 12 cores @ 3.00GHz |
Processor in Integrated Manger for Lustre ( IML ) server |
2 x Intel Xeon Gold 5218 with 16 cores @ 2.3GHz |
2 x Intel Xeon Gold 5118 with 12 cores @ 2.3GHz |
Memory DIMMs in OSS and MDS |
12 x 32 GiB 2933 MT/s DDR4 RDIMMs |
24 x 16GiB 2666MT/s DDR4 RDIMMs |
Memory DIMMs in IML server |
12 x 8GiB 2666MT/s DDR4 RDIMMs |
12 x 8GB 2666MT/s DDR4 RDIMMs |
BIOS |
2.1.8 or later |
1.4.5 or later |
OS Kernel |
3.10.0-957.1.3 |
3.10.0-862 |
Lustre Version |
2.10.7 |
2.10.4 |
IML version |
4.0.10.0 |
4.0.7.0 |
Mellanox OFED versi |
4.5-1.0.1.0 |
4.4-1 |
Performance Results
We configured the updated Ready Solution as listed in Table 1 and ran performance checks with IOzone sequential, IOzone random and MDtest benchmarks to verify the performance of the updated solution. The test methodology including the benchmark commands for all tests was identical to the method used and described previously.
For all tests we used the client test bed as described in the Table 2 below
Table 2 : Client test bed
Number of Client nodes |
8 |
Client node |
C6420 |
Processors per client node |
2 x Intel(R) Xeon(R) Gold 6248 with 20 cores @ 2.50GHz |
Memory per client node |
12 x 16GiB 2933 MT/s RDIMMs |
BIOS |
2.2.6 |
OS Kernel |
3.10.0-957.10.1 |
Lustre version |
2.10.7 |
Mellanox OFED |
4.5-1.0.1.0 |
Sequential IOzone Performance
We ran sequential IOzone version 3.487, using the clients listed in Table 2. We ran tests from single thread up to 256 threads, with multiple threads per client past 8 threads. As per the test method the aggregate data size for the test was 2 TB. For lower thread counts lesser than 32 threads, a Lustre stripe count of 32 was used, and for thread counts greater than equal to 32, Lustre stripe count was set to 1. Caching effects were minimized as described in the previous blog.
The Lustre client side tuning parameters used for this test are listed below
lctl set_param osc.*.checksums=0
lctl set_param timeout=600
lctl set_param at_min=250
lctl set_param at_max=600
lctl set_param ldlm.namespaces.*.lru_size=2000
lctl set_param osc.*OST*.max_rpcs_in_flight=16
lctl set_param osc.*OST*.max_dirty_mb=1024
lctl set_param osc.*.max_pages_per_rpc=1024
lctl set_param llite.*.max_read_ahead_mb=1024
lctl set_param llite.*.max_read_ahead_per_file_mb=1024
Figure 2 : Sequential N-N Writes. A comparison of previous results with current results using Cascade Lake Lustre servers and clients
Figure 3 : Sequential N-N Reads. A comparison of previous results with current results using Cascade Lake Lustre servers and clients
Figures 2 and 3 present the IOzone sequential read and write performance of the latest Cascade Lake based solution and compare these results to the previous Skylake based solution. Comparing with previous results, we see that there is performance improvement in sequential reads as well as writes with Cascade Lake based clients and Lustre servers for the lower thread counts below 32 threads . We can note up to slightly more than 2 times performance improvement in sequential writes as well as reads at lower thread counts below 32 threads. We believe this performance delta can be attributed to the hardware mitigations for side-channel exploits included in Cascade Lake processors (
ref link). However, other contributing factors could also be faster memory in the new solution, and the updated software versions.
It can also be noted that the sequential performance at higher thread counts remains very similar to the previous solution. This is because the enhancements in Cascade Lake processors do not contribute to additional performance uplift once the solution is operating at the full potential of the backend storage controllers.
Random IOzone Performance
We ran random IOzone , version 3.487, using the clients listed in Table 2. and ran performance checks with 16, 64 and 256 threads. Similar to the previous test method, the aggregate data size was 2 TB and the stripe size was set to 4 MB. Caching effects were minimized as described in the previous blog.
The Lustre client side tuning parameters used for this test are listed below
lctl set_param osc.*OST*.max_rpcs_in_flight=256
lctl set_param osc.*.max_pages_per_rpc=1024
Figure 4 : IOzone Random N-N Reads.A comparison of previous results with current results using Cascade Lake Lustre servers and clients
Figure 4 plots the results of the random I/O tests. Comparing previous and current results we see that the trend remains the same and the performance delta observed is not statistically significant based on run to run variation.
Metadata MDtest Performance
MDTest tool version 1.9.3 was used to evaluate the metadata performance of the system. The MPI distribution used was Intel MPI. The tests were run using DNE with 2 MDTs and directory striping. The test methodology, the command used and number of files and directories created were identical to what was explained in the previous blog.
Figure 5: Metadata operations with MDtest. A comparison of previous results with current results using Cascade Lake Lustre servers and clients
Figure 5 presents the results of the metadata tests. Comparing the current results with previous, we see that the trend for all three metadata operations remain the same. We can note a 75.4% improvement in peak file create operations, 18% drop in peak file remove operations and negligible performance delta in file stat operations. We could possibly attribute the performance deltas seen to the software and hardware updates on the solution stack as seen in Table 1.
Conclusion
We have verified and validated the updates to the Lustre Ready Solution with regards to configuration, installation and performance. Also the performance data that has been gathered is included in this blog.
Comparing previous results to current results with Cascade Lake based Lustre servers and clients
1) Sequential IO : We see up to slightly more than 2 times performance improvement with sequential writes and sequential reads at lower thread counts below 32 threads. Peak performance remains similar to the previous Skylake based solution.
2) Random IO : We can see a very similar trend in read and write performance with a performance delta not statistically significant considering run to run variation.
3) Metadata performance tests : We see an improvement in file create operations up to 75.4% at the peak. File stat operations remain very close to the results observed previously with negligible performance delta. We see about 18% drop in file remove operations at the peak while the trend in general for file remove operations remains the same and delta negligible at other thread counts.
References
1)
IOzone benchmark
2)
Mdtest benchmark