PowerFlex 3.x Silent Corruption Found after Rebuild
Summary: PowerFlex volumes/datastores corrupt after disk failure or node disconnected.
Symptoms
Scenario
Data integrity/corruption detected after rebuild occurs.
Medium Granularity (MG) pool with Background (BG) Scanner disabled
Symptoms
SDC-side OS detects file system inconsistency:
2020-11-12T01:12:56.665Z cpu51:8618782)WARNING: Res3: 4324: Volume 5f8ddcc6-987f8604-c721-e4434b22e350 ("CORPFX-PRD-DS09") might be damaged on the disk. Resource cluster metadata corruption has been detected.
Or
2020-11-11T20:12:07.454Z cpu37:2100107)WARNING: VmfsSparse: 4987: Real sector 1838528773 exceeds free sector 241361047
Rebuild occurs, no matter what reason:
22020-11-11 14:57:36.850 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
Impact
Volume's content is no longer trustworthy.
Cause
Without BG Scanner enabled, the array does not check for data inconsistency.
When a rebuild occurs, secondary copies are promoted to primary, and then replicated.
If the secondary copy does not match the primary (most likely due to silent corruption on disk), the array is serving corrupted data.
Resolution
Remove all volumes in the impacted Storage Pool (SP), and re-create it:
1. Remove the SP devices,
2. Remove the SP,
3. Re-create the SP with zero pad enabled (if disabled), and then the Background scanner can be enabled. → Data_comparison mode is recommended.
Versions 3.5.x.x also include MG checksum for persistent checksumming like FG has.
Impacted Versions
Any version with BG Scanner is disabled.
Fixed In Version
N/A