跳至主要內容
  • 簡單快速地下訂單
  • 檢視訂單及追蹤商品運送狀態
  • 建立並存取您的產品清單
  • 使用「公司管理」來管理您的 Dell EMC 網站、產品和產品層級連絡人。
部分文章編號可能已變更。如果這不是您要找的文章,請嘗試搜尋所有文章。搜尋文章

PERC9 –Under certain conditions, there is a possible data integrity issue with the Rapid Rebuild.

摘要: PERC 9 controllers (H330, H730, H730P, H830, FD33xS, and FD33xD) introduced a feature called Rapid Rebuild that speeds up the time to rebuild failed drives in certain conditions. There is a possibility for data integrity issues when this feature is used under certain conditions. ...

本文可能採用自動翻譯。如果您對翻譯品質有任何寶貴意見,請使用此頁面底部的表單告訴我們,謝謝。

文章內容


症狀

Dell PERC 9 controllers (H330, H730, H730P, and H830) introduced a feature called Rapid Rebuild that speeds up the time to rebuild failed drives in certain conditions. This feature is based on T10 Rebuild Assist.   Dell has determined that there is a possibility for data integrity issues when this feature is used under certain conditions.

Table of content

  1. Feature Operation
  2. Problem Statement
  3. How can I tell if this has happened
  4. Solution

     

Feature Operation:

Any drive that is capable of Rapid Rebuild will register this capability with the controller. This feature is supported with parity raid virtual disks: Raid 5, RAID 6, RAID 50 and RAID 60. The feature requires a server to have capable drives, parity based RAID levels, and a configured hot spare (either global or dedicated to the exact VD). Each capable drive in the VD keeps track of its own failed blocks/sectors. A drive may then fail in such a way that it can still communicate with the PERC, and tell the PERC which sectors are still "good". Instead of performing time consuming RAID recovery XOR algorithms for the entire disk, the PERC will copy the good sectors to the hot spare, and only have to recover the known bad sectors. The PERC will copy the good sectors to the hot spare, and only have to rebuild those known bad sectors. Without Rapid Rebuild, the PERC has to rebuild all sectors which can be very time consuming for large capacity drives.

 

Problem Statement

When the PERC is rebuilding the data for the "bad" sectors, it incorrectly writes data from cache to the failed drive instead of the hot spare. This results in data and associated parity not being written to the hot spare. In write through mode, parity errors will occur.  In write back mode, errors will occur in both data and associated parity.

 

How can I tell if this has happened

 

 

QNA44044_en_US__1icon Note: How to extract the PERC Controller log is explained in the article SLN295784.

 


 From the PERC Controller log if you see the below highlighted text you have encountered the issue.

C0:EVT#395950-08/17/16 13:54:59: 114=State change on PD 0b(e0x20/s11) from OFFLINE(XX) to REBUILDASSIST(12)



 

Solution

  • If your VD was in Write Through mode, only parity data is at risk and running a CC (consistency check) will restore your parity. This will only work if this is a single occurrence of rebuild assist. If more than one occurrence of rebuild assist to the same VD, you should restore your data from a previous backup.

  • If your VD was in Write Back mode and you have encountered the issue then you should restore your data from backup. Unfortunately, there is no way to recover the lost data. Please restore from a previous backup.

If you have not encountered this issue then to protect against this scenario please update your PERC H730, H730p, H830 controller firmware to 25.5.0.0018 and PERC H330 controller firmware to 25.5.0.0019 or later firmware which disables the Rapid Rebuild feature.

To download the latest firmware version, please navigate to the section "Drivers and Downloads" of a 13G server and expand the "SAS Raid" menu file.

The correct firmware has been implemented in the factory and new servers are not exposed to this issue.

 
QNA44044_en_US__1icon Dell Note: As part of on-going business process improvement across all key functions, Dell continually reviews key processes and implements improvements. Dell places a high focus on the development, test and manufacturing processes for our server and storage systems. These process improvements will help prevent future problems and are allowing Dell to react more rapidly and more aggressively to potential issues in the field.

 

原因

-

解析度

-

文章屬性


受影響的產品

PowerEdge RAID Controller H330, PowerEdge RAID Controller H730, PowerEdge RAID Controller H730P, PowerEdge RAID Controller H830

上次發佈日期

16 9月 2021

版本

5

文章類型

Solution