Authored by Mario Gallegos and Xin Chen, HPC and AI Innovation Lab, October 2018
The latest Dell EMC Ready Solution for HPC NFS Storage (NSS) with High Availability (NSS-HA solution
) version will be NSS7.3-HA, with release scheduled later this month.
This release of NSS incorporates the new Dell EMC PowerVault ME4084 storage arrays
and Red Hat Enterprise Linux 7.5, and continues to use the Intel Xeon Scalable Processor Family CPUs (architecture codenamed Skylake) to offer higher overall system performance than previous NSS-HA solutions. This blog presents the results of the I/O performance tests for this latest version of the NSS solution.
Figure 1 shows the design of NSS7.3-HA configuration. The major differences between NSS7.3-HA and its immediate predecessor, NSS7.2-HA are:
- Back End Storage array:
- NSS7.2-HA: PowerVault MD3460 + optional MD3060e (60 or 120 HDDs)
- NSS7.3-HA: PowerVault ME4084 (84 HDDs)
- Operating System:
- NSS7.2-HA: RHEL 7.4
- NSS7.3-HA: RHEL 7.5
Except for those items like necessary software and firmware updates, NSS7.2-HA and NSS7.3-HA share the same HA cluster design and basic storage configuration. (Refer to NSS7.0-HA white paper
for more detailed information about the configuration.)
Another major improvement from NSS7.2-HA and NSS7.3-HA, is the big increase in maximum capacity. While NSS7.2-HA is limited by Red Hat XFS current support limit of 500 TB, after extensive testing and validation in our labs, Dell EMC and Red Hat reached a cooperative agreement supporting NSS7.3-HA configurations with up to 768 TB of usable space. That is a Dell EMC PowerVault ME4084 fully populated with 12 TB HDDs, or 1008 TB of raw storage space.
Figure 1 shows the NSS 7.3 architecture inside the dotted rectangle, embedded in the typical test bed that includes clients and the public network switch.
Figure 1. NSS7.3-HA 1008 TB Raw Space (768 TB Usable) architecture and test bed
The next table summarizes the different components of the new NSS HA 7.3 solution.
Table 1. Components for NSS7.2-HA and NSS7.3-HA
NSS7.2-HA Release (April 2018)
"PowerEdge 14th generation servers and MD3460 + MD3060e"
NSS7.3-HA Release (October 2018)
"PowerEdge 14th generation server and ME4084 based solution"
Red Hat Enterprise Linux 7.4,
Red Hat Scalable File system (XFS) v4.5.0-12
Red Hat Enterprise Linux 7.5,
Red Hat Scalable File system (XFS) v4.5.0-15
Two Dell PowerEdge R740 servers.
CPU: Dual Intel Xeon Gold 6136 @ 3.0 GHz, 12 cores per processor.
Memory: 12 x 16GiB 2666 MT/s RDIMMs.
External Network Connectivity
EDR InfiniBand, 10 GbE or Intel Omni-Path.
For this blog, Mellanox ConnectX-4 IB EDR/100 GbE.
For orders, CX-5 IB EDR/100 GbE.
Gigabit Ethernet, switch Dell Networking S3048-ON
Mellanox OFED 4.3-188.8.131.52
Mellanox OFED 4.4-1.0.0
Direct Storage connection
12 Gbps SAS connections.
Dell EMC MD3460 + optional MD3060e.
60 – 120 – 3.5" NL SAS 4 TB drives.
Two configurations, 240 or 480 TB (raw space).
6 or 12 LUNs, 8+2 RAID 6, segment size 512KiB
Dell EMC PowerVault ME4084.
84 - 3.5" NL SAS drives, up to 12TB.
One configuration: up to 1008TB (raw space).
8 LUNs, linear 8+2 RAID 6, chunk size 128KiB.
4 Global HDD spares.
The new PowerVault ME4084 storage continues to use linear 8+2 RAID 6 as the basic building unit with a new chunk size (segment size) of 128 KiB and a read ahead value of "stripe size" selected for optimum performance. Also, since we now have 84 drives, we have 8 LUNs based on the RAID 6s and 4 global spare HDDs configured to immediately replace any failed disk. That means that this solution can have up to 768 TB of usable space.
NSS7.3-HA I/O performance
This blog presents the results of the I/O performance tests for the current NSS-HA solution, namely NSS7.3. All performance tests were conducted in a HA failure-free scenario to measure the maximum capability of the solution. The tests focused on three types of I/O patterns: large sequential reads and writes, small random reads and writes, and three metadata operations (file create, stat, and remove).
A 32-node compute cluster was used to generate workload for the benchmarking tests. The clients and the 1008 TB (raw storage size) NSS configuration were connected using InfiniBand EDR and the file system mounted via IPoIB. Each I/O benchmark test was run over a range of clients to test the scalability of the solution. Details about the clients used are listed in the next table.
Table 2. Clients configuration (performance testing)
|Number of servers
||32 server cluster
||Intel(R) Xeon(R) Gold 6148 CPU @ 2.40 GHz
||Red Hat Enterprise Linux Server release 7.4
||Mellanox ConnectX-4 VPI IB EDR/100 GbE single port QSFP28
The IOzone and MDtest benchmarks were used in this study. IOzone was used for the sequential and random tests. For sequential tests, a request size of 1024 KiB was used. The total amount of data transferred was 256 GiB to ensure that the NFS server cache was saturated. Random tests used a 4 KiB request size and each client read and wrote a 4 GiB file. Metadata tests were performed using the MDtest benchmark with OpenMPI and included file create, stat, and remove operations. (Refer to Appendix A of the NSS7.0-HA white paper for the complete commands used in the tests.)
IPoIB sequential writes and reads
Figures 2 and 3 show the sequential write and read performance. Since the test cluster had 32 nodes, the 64 thread data point was obtained using 32 clients running 2 threads each.
For the NSS7.3-HA, the peak read performance is 7 GB/sec, and the peak write performance is almost 5 GB/sec. From the two figures, it is obvious that the current NSS7.3-HA solution has higher sequential performance numbers than the previous version. Reads are up to 18.7% better, but write performance is especially better with up to 2.65 times (at 16 threads) the performance of the previous solution. Comparing peak performance values, writes on NSS7.3-HA are 2.13 times faster, and reads are 12.5% better.
This is partially due to the higher SAS internal speed of 12 Gbps for all PowerVault ME4084 internal components including HDDs (PowerVault MD3460 was 6 Gbps) allowing a higher throughput per LUN, but also due to the new storage controllers that can process information faster than the previous generation PowerVault MD3.
Figure 2. IPoIB large sequential write performance
Figure 3. IPoIB large sequential read performance
IPoIB random writes and reads
Figure 4 and Figure 5 show the random write and read performance.
From the figure, the random write achieves peak performance at 32 threads while the previous version of the solution peaked at 64 threads. The random read performance increases steadily on the NSS7.3 up to 32 clients and for the previous solution the peak was at 16 clients. Again, the new storage shows its superior performance over the predecessor with up to 3.44 times improvement on writes (at 2 threads) and 85% higher read performance (at 32 threads) than the predecessor. Comparing peak performances, the difference is about 13% on random writes and 85% on random reads. These improvements are mainly due to the new PowerVault ME4084 controllers that have faster processing capabilities compared to the PowerVault MD3460 controllers.
Figure 4. IPoIB random write performance
Figure 5. IPoIB random read performance
IPoIB metadata operations
Figure 6, Figure 7, and Figure 8 show the results of file create, stat, and remove operations, respectively. As the HPC compute cluster has only 32 compute nodes, in the graphs below, each client executed a maximum of one thread for client counts up to 32, and for thread counts of 64, 128, 256, and 512, each client executed 2, 4, 8, or 16 simultaneous operations (threads).
For file-creates, compared to the previous solution, the new solution shows a sustained improvement of about twice the performance with a peak difference (208%) at 32 clients, then decreases slightly, but even comparing the peak performance for both solutions at 256 threads, the new solution is 30% faster.
Stat operations are the most improved by the new storage, where improvements are as high as 7.7 times of the predecessor at 256 threads and comparing the peak performances, NSS7.3 shows almost 6 times the number of stat operations per second than the previous version of NSS.
Finally, remove operations have comparatively a marginal improvement with most data points at 33% or better performance than the previous solution; except for 128 threads where performance is 2.21 times better. At peak performance, the new storage achieves almost 55% higher performance compared to the previous NSS system.
All these improvements are due to the faster HDDs using SAS3 speeds (12 Gbps) all around, as well as the new PowerVault ME4084 controllers, capable of higher IOPs and bandwidth.
Figure 6. IPoIB file create performance
Figure 7. IPoIB file stat performance
Figure 8. IPoIB file remove performance
Conclusions and Future Work
Over the different generations of the solution, the NSS-HA solution has under gone many hardware and software updates to continually offer high availability, higher performance, and larger storage capacity. In all of these versions, the core architectural design of the NSS-HA solution family has remained unchanged. In order to show the performance difference between NSS7.3-HA and the previous release (NSS7.2-HA), the performance numbers of both solutions were contrasted, showing the superior performance of the latest version of the solution based on PowerVault ME4084:
Up to 2.65 times the sequential write and up to 18.7% faster read performance.
Up to 3.44 times the random write and up to 85% faster random read performance.
Up to 2.1 times the create rate, 7.7 times the stat rate and 2.2 times the remove rate.
The next phase will be characterizing the NSS7.3-HA solution connected with Intel Omni-Path adapters. For detailed information about NSS-HA solutions, please refer to our published white papers:
Dell HPC NFS Storage Solution High Availability Configurations, release version NSS2-HA, published at April 2011.
Dell HPC NFS Storage Solution High Availability Configurations with Large Capacities, release version NSS3-HA, published at February 2012.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers, release version NSS4-HA, published at July 2012.
Dell HPC NFS Storage Solution - High Availability (NSS-HA) Configuration with Dell PowerVault MD3260/MD3060e Storage Arrays, release version NSS4.5-HA, published at October 2012.
Dell HPC NFS Storage Solution - High Availability (NSS-HA) Configuration with Dell PowerVault MD3260/MD3060e Storage Arrays, release version NSS4.5-HA updated, published at May 2013.
Dell HPC NFS Storage Solution - High Availability NSS5-HA configurations, release version NSS5.0-HA, published at September 2013.
Dell HPC NFS Storage Solution - High Availability (NSS5.5-HA) Configuration with Dell PowerVault MD3460 and MD3060e Storage Arrays, release version NSS5.5-HA, published at September 2013.
Dell HPC NFS Storage Solution - High Availability (NSS6.0-HA) Configuration with Dell PowerEdge 13th Generation Servers, release version NSS6.0-HA, published at November 2014.
Dell HPC NFS Storage Solution - High Availability (NSS7.0-HA) Configuration, release version NSS7.0-HA, published at May 2016.
Note: for any customized configuration/deployment, please contact your Dell EMC representative for specific guidelines.
Quick Tips content is self-published by the Dell Support Professionals who resolve issues daily. In order to achieve a speedy publication, Quick Tips may represent only partial solutions or work-arounds that are still in development or pending further proof of successfully resolving an issue. As such Quick Tips have not been reviewed, validated or approved by Dell and should be used with appropriate caution. Dell shall not be liable for any loss, including but not limited to loss of data, loss of profit or loss of revenue, which customers may incur by following any procedure or advice set out in the Quick Tips.