PowerFlex 14G SDS Decouples When Creating NVDIMM namespace
Summary: SDS decouples when creating NVDIMM namespace on a live cluster.
Symptoms
Scenario
Adding a new NVDIMM and then creating the namespace while the SDS is live can cause the SDS to decouple.
Symptoms
- Creating namespaces on the NVDIMMs with these commands:
ndctl create-namespace --force --reconfig=namespace4.0 --mode=devdax --align=4K --verbose --size=16G ndctl create-namespace --force --reconfig=namespace5.0 --mode=devdax --align=4K --verbose --size=16G
- The MDM events log shows SDSs decoupling.
2021-09-02 13:46:26.096 SDC_DISCONNECTED_FROM_SDS_IP WARNING SDC Name: sdc39; ID: 5473912c00000051 disconnected from the IP address 10.1.0.239 of SDS sds107; ID: 80bc40f30000002e 2021-09-02 13:46:26.096 SDC_DISCONNECTED_FROM_SDS_IP WARNING SDC Name: sdc39; ID: 5473912c00000051 disconnected from the IP address 10.1.1.240 of SDS sds107; ID: 80bc40f30000002e 2021-09-02 13:46:26.115 SDC_DISCONNECTED_FROM_SDS_IP WARNING SDC Name: sdc64; ID: 54736a1d0000000e disconnected from the IP address 10.1.0.239 of SDS sds107; ID: 80bc40f30000002e 2021-09-02 13:46:26.115 MULTIPLE_SDC_CONNECTIVITY_CHANGES INFO Multiple SDC connectivity changes occurred. 2021-09-02 13:46:29.134 SDS_DECOUPLED ERROR SDS: sds107 (id: 80bc40f30000002e) decoupled. 2021-09-02 13:46:30.293 MDM_DATA_FAILED CRITICAL The system is now in DATA FAILURE state. Some data is unavailable. 2021-09-02 13:46:30.585 SDS_RECONNECTED INFO SDS: sds107 (ID 80bc40f30000002e) reconnected. 2021-09-02 13:46:40.140 SDS_DECOUPLED ERROR SDS: sds107 (id: 80bc40f30000002e) decoupled. 2021-09-02 13:46:42.391 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
- The sar results show that the CPU had 5-second gaps when collecting data:
01:46:24 PM 61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 01:46:24 PM 62 3.73 0.00 1.49 0.00 0.00 0.75 0.75 0.00 0.00 93.28 01:46:24 PM 63 0.00 0.00 0.74 0.00 0.00 0.00 0.00 0.00 0.00 99.26 <==== 5 second gap 01:46:29 PM all 0.09 0.00 0.10 0.00 0.00 0.00 1.93 0.00 0.00 97.87 01:46:29 PM 0 0.23 0.00 0.00 0.00 0.00 0.00 0.23 0.00 0.00 99.54 01:46:29 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 01:46:29 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Impact
If enough SDSs decouple because we are creating NVDIMM namespaces on multiple SDSs, the cluster can go to a Data Failure state and some data will be unavailable.
Cause
Creating namespaces on the NVDIMMs and subsequent pauses of CPU and other OS resources is directly related to how the OS interacts with the driver and the bus the NVDIMM is attached to. It can cause brief periods of the OS stopping relative to the other nodes in the cluster, which causes the SDS to decouple.
Resolution
It is recommended to create NVDIMM namespaces before cluster data is live. If not possible or if a FRU replacement, use maintenance mode. That enables the OS on the SDS to create the namespaces without fear of decoupling.
Impacted Versions
PowerFlex v3.x and beyond
Fixed In Version
This is an OS issue and will not be fixed in PowerFlex.