PowerFlex 3.X: SDSs Decouple After MDM Ownership Change

Summary: Multiple SDSs decouple after an MDM ownership was issued.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

MDM ownership changes by an admin-initiated activity.

MDM ownership changed due to a failure on one of the MDM servers.

 

MDM event logs showing that a new Primary MDM node has taken ownership:

2024-07-04 07:45:28.088000:0114714:MDM_BECOMING_MASTER              WARNING  This MDM is switching to Master mode. MDM will start running.

Multiple SDSs reconnect to the new Primary MDM and shortly disconnect:

2024-07-04 07:45:41.218000:0115810:SDS_RECONNECTED                  INFO     SDS: sds1 (ID 13f4fe8800000001) reconnected
2024-07-04 07:45:41.377000:0115811:SDS_RECONNECTED                  INFO     SDS: sds2 (ID 13f4fe3b00000002) reconnected

2024-07-04 07:45:43.194000:0115990:SDS_DECOUPLED                    ERROR    SDS: sds1 (id: 13f4fe8800000001) decoupled.
2024-07-04 07:45:44.197000:0116051:SDS_DECOUPLED                    ERROR    SDS: sds2 (id: 13f4fe3b00000002) decoupled.

2024-07-04 07:45:45.192000:0115809:MDM_DATA_DEGRADED                ERROR    The system is now in DEGRADED state.
2024-07-04 07:45:45.786000:0116061:MDM_DATA_FAILED                  CRITICAL The system is now in DATA FAILURE state. Some data is unavailable.

In this case, the SDSes that decoupled did eventually stay connected to the MDM.

 

SDS trace logs showing that it is blocking itself:

04/07 07:45:39.606135 0x7fc708919db8:kalive_IsBlocked:00570: Keep-Alive (KA) is blocked: TRUE
04/07 07:45:46.578166 0x7fc702567db8:kalive_ShouldSendKeepAlive:00345: KA aborted because SDS is blocked

The SDS process implicates itself if it believes it has a local issue. It does this to prevent I/O issues and attempts to reconnect to the Primary MDM.

 

Impact

One or more storage pools experience degraded capacity.

One or more storage pools experience failed capacity.

 

Cause

When a new MDM takes ownership of the cluster, all SDSs connects to the new primary MDM. During this transition, the SDSs receive reconfiguration commands from the MDM. In rare cases, the SDSs might complete the MDM's reconfiguration instructions but then have to wait for further guidance. If the MDM does not provide additional instructions within 5 seconds, the SDSs marks themselves as blocked and attempt to reconnect to the MDM. This issue is more common in very large environments with 70 or more SDSs, where the MDM may not be fast enough to send the necessary instructions, causing the SDSs to disconnect and try again.

 

Resolution

To prevent this issue from occurring, upgrade the PowerFlex software to a version that includes the fix.

 

Impacted Version

PowerFlex 3.6 and older

Fixed In Version

PowerFlex 3.6.1 and newer

 

Affected Products

PowerFlex Software, VxFlex Product Family, VxFlex Ready Node, Ready Node Series
Article Properties
Article Number: 000228229
Article Type: Solution
Last Modified: 02 يناير 2026
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.