PERC: Under certain conditions, there is a possible data integrity issue with the Rapid Rebuild
Summary: PERC 9 controllers (H330, H730, H730P, H830, FD33xS, and FD33xD) introduced a feature called Rapid Rebuild that speeds up the time to rebuild failed drives in certain conditions. There is a possibility for data integrity issues when this feature is used under certain conditions. ...
Symptoms
Table of content
Feature Operation:
Any drive that is capable of Rapid Rebuild registers this capability with the controller. This feature is supported with parity raid virtual disks: RAID 5, RAID 6, RAID 50, and RAID 60
The feature requires a server to have capable drives, parity based RAID levels, and a configured hot spare (either global or dedicated to the exact VD). Each capable drive in the VD tracks its own failed blocks/sectors. A drive may then fail in such a way that it can still communicate with the PERC, and tell the PERC which sectors are still "good." Instead of performing time-consuming RAID recovery XOR algorithms for the entire disk, the PERC copies the good sectors to the hot spare, and only have to recover the known bad sectors. The PERC copies the good sectors to the hot spare, and only have to rebuild those known bad sectors. Without Rapid Rebuild, the PERC has to rebuild all sectors which can be time-consuming for large capacity drives.
Problem Statement
When the PERC is rebuilding the data for the "bad" sectors, it incorrectly writes data from the cache to the failed drive instead of the hot spare. This results in data and associated parity not being written to the hot spare and in write-through mode, parity errors occur. In write-back mode, errors occur in both data and associated parity.
How can I tell if this has happened?
From the PERC Controller log if you see the below highlighted text that you have encountered the issue.
C0:EVT#395950-08/17/16 13:54:59: 114=State change on PD 0b(e0x20/s11) from OFFLINE(XX) to REBUILDASSIST(12)
Cause
Resolution
Solution
-
If your VD was in Write-Through mode, only parity data is at risk, and running a CC (consistency check) restores your parity. This only works if this is the first occurrence of a rebuild requirement. If more than one rebuild for the same VD occurred, restore your data from a backup.
-
If your VD was in Write-Back mode and you have encountered the issue then you should restore your data from backup. Unfortunately, there is no way to recover the lost data.
To download the latest firmware version, go to "Drivers & Downloads" of a 13G server and expand the "SAS RAID" menu.
The correct firmware has been implemented in the factory and new servers are not exposed to this issue.