PowerFlex Write Performance Issues
Summary: After performing network maintenance, the write performance of certain SDSs is now poor.
Symptoms
Scenario
- This issue is seen after network maintenance on Top of Rack switches (TORs) was performed, typically a reboot of the switches.
- The SDS nodes are using an LACP bond for the data networks.
- Only the SDS nodes using the switches where maintenance was performed are affected.
- The write performance can be up to hundreds of milliseconds for a certain Storage Pool/PD.
- The read performance of the same set of SDSs is normal.
- The "NET_LONG_RCV_GRP_PROCESS" in the diag_counters.txt shows the values climbing quickly, while the last time incremented is staying low.
Example:
Comp :: Counter :: Value :: ExtData :: Last Counted(Ms) NET :: NET_LONG_RCV_GRP_PROCESS :: 3756453 :: 0xffffffff :: 3120 NET :: NET_LONG_RCV_GRP_PROCESS :: 3825395 :: 0xffffffff :: 960 NET :: NET_LONG_RCV_GRP_PROCESS :: 3705906 :: 0xffffffff :: 1320 NET :: NET_LONG_RCV_GRP_PROCESS :: 4094919 :: 0xffffffff :: 1230 NET :: NET_LONG_RCV_GRP_PROCESS :: 3954725 :: 0xffffffff :: 1390 NET :: NET_LONG_RCV_GRP_PROCESS :: 3594178 :: 0xffffffff :: 420 NET :: NET_LONG_RCV_GRP_PROCESS :: 3702403 :: 0xffffffff :: 680 NET :: NET_LONG_RCV_GRP_PROCESS :: 3830299 :: 0xffffffff :: 510 NET :: NET_LONG_RCV_GRP_PROCESS :: 3491713 :: 0xffffffff :: 330 NET :: NET_LONG_RCV_GRP_PROCESS :: 4155343 :: 0xffffffff :: 690
In this example, the value in the third column is high (and increasing if watching live). The fifth column shows the last time it was encountered, which is less than a second for a good portion of the SDSs.
In a healthy PowerFlex system, the third column will not be counting up and the fifth column will be counting up, as the last time encountered increases over time.
To watch the counters live, the following commands can be run:
#Set the variable for the SDSs in the affected Protection Domain. Enter the proper PD name here.
pd=<PD_NAME>
#Set the variable for the number of SDSs in the Protection Domain. Run this as is.
num=`scli --query_protection_domain --protection_domain_name $pd |grep Protection |awk '{print $16}'`
#login to allow the last command to work.
scli --login --username admin
#Watch the correct counter from the "query_diag_counters" command for each SDS.
watch -d -n 1 "for x in \$(scli --query_all_sds | grep -A $num $pd | grep ID | awk '{print \$5}'); do echo \$x; scli --query_diag_counters | grep -A30 \$x | grep -Em1 '\$x|NET_LONG_RCV_GRP_PROCESS'; done"
In a healthy system, expect the fifth column to be counting up regularly as time passes and the third column to be static. If the fifth column time is staying low and the third column is counting up, this is a symptom of the issue.
Impact
Write performance is poor for clients.
Cause
The "NET_LONG_RCV_GRP_PROCESS" that is being tracked above indicates that sending TCP data to a remote SDS took longer than 1 s to complete.
This delay can happen due to a small initial TCP congestion window after network maintenance and not having the OOO (Out of Order) packets parameter set appropriately in the Operating System. This causes the SDS to SDS sockets to not be able to communicate effectively, leading to multiple TCP retransmits and reduced segment size. This creates a higher latency on writes, as this will only affect the SDS to SDS sockets.
Read latency remains unaffected as the SDCs (Clients) communicate with a single SDS per read IO request and do not rely upon SDS to SDS TCP communication.
Resolution
For an immediate workaround, restart the SDS service on each node that is affected. Use maintenance mode when restarting the SDS process. A "pkill sds" is sufficient once the node is in maintenance.
To prevent the issue from happening in the future, do the following:
- Apply the sysctl settings discussed in this public KB article:
Storage Data Server Nodes May Not Contain Correct System Tuning Parameters Which May Result in Performance Issues
- If using RHEL/CentOS 7, update the OS kernel version on the SDS nodes to "3.10.0-1160.66.1" or later
Impacted Version
PowerFlex 3.x
Fixed In Version
RCM version 3.6.3.2 or later
IC version 38.363.02 or later