PowerFlex 3.X Data Unavailable From A Single SDS Decouple Event

Summary: PowerFlex system goes into a DATA FAILURE state from a single SDS decouple event.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

- PowerFlex System was in a normal, healthy state before any events.

- The MDM event logs show a single SDS decoupling, and the system goes to a DATA FAILURE state almost immediately. The system stays in this state, even though the SDS reconnects to the system.

2023-12-18 14:39:48.489000:1047016:SDS_DECOUPLED                    ERROR    SDS: sds93 (id: f1f8bfde00000023) decoupled. 
2023-12-18 14:39:49.403000:1047017:MDM_DATA_DEGRADED                ERROR    The system is now in DEGRADED state. 
2023-12-18 14:39:50.406000:1047018:MDM_DATA_FAILED                  CRITICAL The system is now in DATA FAILURE state. Some data is unavailable.
2023-12-18 14:40:06.143000:1047036:SDS_RECONNECTED                  INFO     SDS: sds93 (ID f1f8bfde00000023) reconnected.

 - After the system goes into a DATA FAILURE state, there is now a disk (or multiple disks) on a different SDS in an error state showing in the Presentation Server UI and scli output.

 - The DATA FAILURE state can be exited by clearing the device errors.

 - The device(s) that went to an error state had been set to a WARNING state previously as noted in MDM event logs.

2023-12-14 11:58:07.680000:0955611:SDS_DEV_WARNING                  WARNING  A device warning threshold has been reached on SDS: sds93, Device: /dev/sdj. State: NORMAL upDownState: UP processState: DEV_ERR_INPROGRESS

- If an MDM switchover occurs, the device enters an ERROR state.

Impact

A portion of data are unavailable until the device errors are cleared. 

I/O errors can be seen by clients. 

An MDM switchover will cause the Device to enter an Error state and may result in a Data Unavailable (DU) if this occurs on multiple hosts simultaneously.

Cause

The WARNING state of the device was not handled correctly by the MDM and this technically was the first failure, even though the disk was still in use and physically fine. When the second SDS failed, this was the second failure and what caused the DATA FAILURE state. 

 

Resolution

If already impacted by this series of events:
 - Clear the devices that are now showing in an error state.
 

If not yet impacted by a DATA FAILURE state:
 - Locate the SDS and device that was placed in a WARNING state from the MDM event logs.
 - Preemptively clear the device error from scli.
 - Clearing a device error when it is showing no errors is not harmful and only clears flags that are not needed.

scli --clear_sds_device_error --sds_name <SDS_NAME> --device_path <PATH>

Impacted Versions

PowerFlex 3.6.x

Fixed In Version

PowerFlex 3.6.4

Affected Products

PowerFlex rack, PowerFlex custom node, PowerFlex appliance connectivity, PowerFlex appliance R650, PowerFlex appliance R6525, PowerFlex appliance R660, PowerFlex appliance R6625, Powerflex appliance R750, PowerFlex appliance R760 , PowerFlex appliance R7625, Ready Node Series, PowerFlex appliance R640, PowerFlex appliance R740XD, PowerFlex appliance R7525, PowerFlex appliance R840 ...
Article Properties
Article Number: 000221806
Article Type: Solution
Last Modified: 04 Feb 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.