Numéro d’article: 000139277

How to Handle Puncturing (Bad Blocks) on Virtual Disks for PowerEdge servers

Résumé: troubleshooting steps for (puncturing) bad blocks on HDDs in PowerEdge servers with PERC-controllers. Especially when no backup is possible, the following information may help to bring an impacted Virtual Drive back to an optimal state. ...

Cet article a peut-être été traduit automatiquement. Si vous avez des commentaires concernant sa qualité, veuillez nous en informer en utilisant le formulaire au bas de cette page.

Contenu de l’article

Symptômes

Cause

Résolution

This article provides troubleshooting steps for (puncturing) bad blocks on HDDs in PowerEdge servers with PERC-controllers. Especially when no backup is possible, the following information may help to bring an impacted Virtual Drive back to an optimal state.

1. Fault Descriptions

Fault No.1:

The OpenManage Server Administrator (OMSA) shows a red cross in front of a Virtual Disk (Figure 1).

SLN111146_en_US__11343098652871.1
Figure 1: Virtual Disk with red cross in Status (example H800)

Note: Dell OpenManage Server Administrator (OMSA) provides a complete, one-to-one systems management solution. OMSA can be categorized into two applications:
- Integrated - Web browser-based Graphical User Interface (GUI)
- Command Line Interface (CLI) - Through the operating system

Fault No.2:

The Windows System Log shows Bad Block errors (Figure 2).

SLN111146_en_US__31343098674763.2
Figure 2: Bad Block error in Windows System Log shown

Fault No.3:

The RAID controller log (TTYLOG) shows errors like:

02/26/15 13:43:39: EVT#131878-02/26/15 13:43:39: 97=Puncturing bad block on PD XX(e0x20/s2) at 180ca4a1f

Warning: The Controller Log (TTYLOG) may show no errors.

Find more information about receiving these specific logs in our article about gathering logs.

2. What is the cause:

RAID arrays are not immune to data errors. RAID controller and hard drive firmware contain functionality to detect and correct many types of data errors before they are written to an array/drive. Using outdated firmware can result in incorrect data being written to an array/drive because it is missing the error handling/error correction features available in the latest firmware versions.
Data errors can also be caused by physical bad blocks. For example, this can occur when the read/write head impacts the spinning platter (known as a "Head Crash"). Blocks can also become bad over time due to the degradation of the platter's ability to magnetically store bits in a specific location. Bad blocks caused by platter degradation often can be successfully read. Such a bad block may only be detected intermittently or with extended diagnostics on the drives.

A bad block, also known as a bad Logical Block Address (LBA), can also be caused by logical data errors. This occurs when data is written incorrectly to a drive even though it is reported as a successful write. Additionally, good data stored on a drive can be changed inadvertently. One example is a "bit flip", which can occur when the read/write head passes over or writes to a nearby location and causes data, in the form of zeros and ones, to change to a different value. Such a condition causes the "consistency" of the data to become corrupted. The value of the data on a specific block is different than the original data and may no longer match the checksum of the data. The physical LBA is good and can be written to successfully, but it currently contains incorrect data and may be interpreted as a bad block.

For more information read our article about Double Faults and Punctures in RAID Arrays.

3. Steps to solve the problem:

Note: Current data on the virtual disk is corrupted and will have to be deleted

Create a validated data backup on file level
- A block based backup would transfer the issue
- A file level backup indicates corrupted files (these files should fail to backup)
- There is never a 100% guarantee for keeping all data, if a puncture stripe is already in place
Ensure that all failed drives showing predictive failures are replaced
Delete and recreate the Virtual Disk
- This step will delete all data from the the VD
- Delete the array
- Recreate the array as desired
Perform a Full Initialization of the VD
- Ensure that no Fast Initialization is chosen
- Only a Full (= Slow) Initalisation fixes the issue
Perform a Check Consistency on the new created VD
- If the check consistency completes without errors, the array is now healthy and the puncture is removed
The data can now be restored to the healthy VD
Recommendation: Upgrade all the hard disks firmware to latest version

4. Additional Information

OMSA provides the ability to clear the bad block warnings. To clear bad blocks, the following procedure is recommended:

When performing a backup of the virtual disk with the Verify option selected, two scenarios can occur:
- The backup operation fails on one or more files. In this case, restore the file from a previous backup. After restoring the file, proceed to next step.
- The backup operation completes without errors. This indicates that there are no bad blocks on the written portion of your virtual disk.
Note: If you still receive bad block warnings, the bad blocks are in a non-data area.
Run Patrol Read (under Virtual Disk Tasks in OMSA) and check the system event log to ensure that no new bad blocks are found. If bad blocks still exist, proceed to next step. If not, the condition is cleared.

Note: The automated Patrol Read must be deactivated before the option of manually running this action appears in OMSA.
To clear these bad blocks, execute the Clear Virtual Disk Bad Blocks task. This can be done in the OMSA GUI or use the cli command:
omconfig storage vdisk action=clearvdbadblocks controller=id vdisk=id

Note: To obtain the values for controller ID and virtual disk ID, type omreport storage controller to display the controller IDs, and then type omreport storage vdisk controller=ID to display the IDs for the virtual disks

Propriétés de l’article

Produit concerné

Servers

Dernière date de publication

01 oct. 2021

Version

Type d’article

Solution

Haut de la page

Bienvenue

Bienvenue dans l’univers Dell