PowerFlex 3.x Enabling Persistent Checksum For The First-Time Might Cause Failed Devices
Summary: After enabling Persistent Checksum (PC) on a Medium Granularity (MG) Storage Pool (SP) the system went into a Data Unavailability (DU) state.
Symptoms
PC was enabled for the first time on an MG SP after upgrading to PowerFlex 3.5.x.
One or more devices on one or more SDS report errors Device errors can, but not necessarily, lead to failed capacity (DU).
Symptoms
MDM events
2021-08-16 13:33:47.858 MDM_CLI_CONF_COMMAND_RECEIVED INFO Command enable_persistent_checksum received, User: ': d604566'. [258417068] Storage Pool: sas_10k_1 Validate on read is yes Builder BW limit: 10240 KBps 2021-08-16 13:33:48.252 CLI_COMMAND_SUCCEEDED INFO Command enable_persistent_checksum succeeded. [258417068] 2021-08-16 13:33:48.325 MDM_CLI_CONF_COMMAND_RECEIVED INFO Command enable_persistent_checksum received, User: ': d604566'. [258417078] Storage Pool: sas_10k_2 Validate on read is yes Builder BW limit: 10240 KBps 2021-08-16 13:33:48.531 SP_PERSISTENT_CHECKSUM_STATE_CHANGE INFO Storage Pool ID b468c1b100000002 persistent checksum state changed to BUILDING_PROTECTION 2021-08-16 13:33:48.533 SP_PERSISTENT_CHECKSUM_STATE_CHANGE INFO Storage Pool ID b468c1b200000003 persistent checksum state changed to BUILDING_PROTECTION 2021-08-16 13:33:48.983 CLI_COMMAND_SUCCEEDED INFO Command enable_persistent_checksum succeeded. [258417078] ... 2021-08-16 13:34:20.711 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state. ... 2021-08-16 13:35:00.963 MDM_DATA_FAILED CRITICAL The system is now in DATA FAILURE state. Some data is unavailable.
Presentation Server alerts SDS trace log To find if an SDS device is in a "bad" or "good" state, please run the following command and see the examples below:
/opt/emc/scaleio/sds/bin/trace_decompress -dir /opt/emc/scaleio/sds/logs -s --time_filter '16/08 13:20:00.000000' | grep mgPhyDevPersChksm_HardenStateOnDevHeader |sed 's/[0-9a-fA-F]*[0-9][0-9a-fA-F]*//g' |sort |uniq -c
Example for a "good" state:
20 / ::. x:mgPhyDevPersChksm_HardenStateOnDevHeader:: Changing device state: BUILDING_PROTECTION --> READY_FOR_PROTECTION (new), device flags: NONE --> NONE, on device x (Pers. Checksum) 20 / ::. x:mgPhyDevPersChksm_HardenStateOnDevHeader:: Changing device state: READY_FOR_PROTECTION --> BUILDING_PROTECTION (new), device flags: NONE --> NONE, on device x (Pers. Checksum)
Example for a "bad" state:
1 / ::. x:mgPhyDevPersChksm_HardenStateOnDevHeader:: Changing device state: BUILDING_PROTECTION --> READY_FOR_PROTECTION (new), device flags: NONE --> NONE, on device x (Pers. Checksum) 20 / ::. x:mgPhyDevPersChksm_HardenStateOnDevHeader:: Changing device state: NOT_READY --> BUILDING_PROTECTION (new), device flags: NONE --> NONE, on device x (Pers. Checksum)
Notice that the Changing device state is NOT_READY instead of READY_FOR_PROTECTION as expected.
Impact
SDS device metadata is overwritten and an SDS device or devices might fail. There is a possibility that the system gets into a DU state.
System impact remains within the limits listed above if no SDS service restart or SDS node reboot was done.
It is important to avoid an SDS service restart or an SDS node reboot.
Cause
The Persistent Checksum feature was introduced in PowerFlex 3.5. For the feature to work, SDS devices must allocate space for the checksum in their header (metadata area).
New devices added after PowerFlex 3.5 have this space reserved by default, while old devices that were added before the upgrade to 3.5 must have this space prepared on them. The MDM triggers this preparation process once the user enables the PC feature.
The state of a Storage Pool, as can be queried using the SCLI command below, might be one of the following:
- NOT_READY - After upgrade to PowerFlex 3.5 if there are pre-PowerFlex 3.5 SDS devices.
- PREPARING_DEVICES - The allocation process takes place after the user has enabled PC.
- READY_FOR_PROTECTION - Either the allocation process is completed, or the device was added post-PowerFlex 3.5.
SCLI command:
scli --query_storage_pool --protection_domain_name pd1 --storage_pool_name sp1 | grep -Ei 'persistent checksum|State is'
Example output:
[root@svm54 ~]# scli --query_storage_pool --protection_domain_name pd1 --storage_pool_name sp1 | grep -Ei 'persistent checksum|State is'
Medium granularity persistent checksum:
Persistent checksum is disabled
State is READY_FOR_PROTECTION
A software issue was introduced in PowerFlex 3.5 wherein a single SDS if the last device of a certain SP is removed, that SP wrongfully changes its checksum state from NOT_READY to READY_FOR_PROTECTION.
If the removed device was the last in the entire SP, then this issue would not have occurred, but as long as there are other SDSs that contain pre-PowerFlex 3.5 devices in the SP, the change to READY_FOR_PROTECTION means that the SP responds as if all its devices are prepared with the reserved checksum space to be written on when in fact some (or all) of them are not.
As a result, as soon as the user enables PC, the MDM will instruct the SDSs to start using the (non-existing) reserved checksum space, and device headers will get overwritten with checksum data.
This issue is only relevant for MG zero-padded SP's that contain pre-PowerFlex 3.5 devices.
Resolution
Workaround
- For customers that have not enabled PC yet, but have MG zero-padded storage pools with pre-PowerFlex 3.5 devices then it is possible that their SP state wrongfully changed (if not, it is a matter of time). For them, we can advise removing and re-adding every pre-3.5 device in those SPs, from one SDS at a time.
-
For customers that already enabled persistent checksum and reached the device error scenario, contact Customer Support.
The corrective procedure will only be efficient if the SDS process did not restart or reboot since the headers are in volatile memory (RAM) and will be lost as there is no original to copy from.
Impacted Versions
PowerFlex 3.5.x
PowerFlex 3.6.0.x
Fixed In Version
PowerFlex 3.5.1.4
PowerFlex 3.6.0.2