How to Handle Puncturing (Bad Blocks) on Virtual Disks for PowerEdge servers

How to Handle Puncturing (Bad Blocks) on Virtual Disks for PowerEdge servers




This article provides troubleshooting steps for (puncturing) bad blocks on HDDs in PowerEdge servers with PERC-controllers. Especially when no backup is possible, the following information may help to bring an impacted Virtual Drive back to an optimal state.



Table of Content:

  1. Fault Descriptions

  2. What is the cause

  3. Steps to solve the problem

  4. Additional Information




1. Fault Descriptions

Fault No.1:


The OpenManage Server Administrator (OMSA) shows a red cross in front of a Virtual Disk (Figure 1).


Figure 1: Virtual Disk with red cross in Status (example H800)

Note: Dell OpenManage Server Administrator (OMSA) provides a complete, one-to-one systems management solution. OMSA can be categorized into two applications:
- Integrated - Web browser-based Graphical User Interface (GUI)
- Command Line Interface (CLI) - Through the operating system



Fault No.2:


The Windows System Log shows Bad Block errors (Figure 2).


Figure 2: Bad Block error in Windows System Log shown



Fault No.3:


The RAID controller log (TTYLOG) shows errors like:

02/26/15 13:43:39: EVT#131878-02/26/15 13:43:39: 97=Puncturing bad block on PD XX(e0x20/s2) at 180ca4a1f

Warning: The Controller Log (TTYLOG) may show no errors.

Find more information about receiving these specific logs in our article about gathering logs.



2. What is the cause:


RAID arrays are not immune to data errors. RAID controller and hard drive firmware contain functionality to detect and correct many types of data errors before they are written to an array/drive. Using outdated firmware can result in incorrect data being written to an array/drive because it is missing the error handling/error correction features available in the latest firmware versions.
Data errors can also be caused by physical bad blocks. For example, this can occur when the read/write head impacts the spinning platter (known as a "Head Crash"). Blocks can also become bad over time due to the degradation of the platter's ability to magnetically store bits in a specific location. Bad blocks caused by platter degradation often can be successfully read. Such a bad block may only be detected intermittently or with extended diagnostics on the drives.

A bad block, also known as a bad Logical Block Address (LBA), can also be caused by logical data errors. This occurs when data is written incorrectly to a drive even though it is reported as a successful write. Additionally, good data stored on a drive can be changed inadvertently. One example is a "bit flip", which can occur when the read/write head passes over or writes to a nearby location and causes data, in the form of zeros and ones, to change to a different value. Such a condition causes the "consistency" of the data to become corrupted. The value of the data on a specific block is different than the original data and may no longer match the checksum of the data. The physical LBA is good and can be written to successfully, but it currently contains incorrect data and may be interpreted as a bad block.

For more information read our article about Double Faults and Punctures in RAID Arrays.



3. Steps to solve the problem:

Note: Current data on the virtual disk is corrupted and will have to be deleted
  1. Create a validated data backup on file level

    • A block based backup would transfer the issue
    • A file level backup indicates corrupted files (these files should fail to backup)
    • There is never a 100% guarantee for keeping all data, if a puncture stripe is already in place

  2. Ensure that all failed drives showing predictive failures are replaced

  3. Delete and recreate the Virtual Disk

    • This step will delete all data from the the VD
    • Delete the array
    • Recreate the array as desired

  4. Perform a Full Initialization of the VD

    • Ensure that no Fast Initialization is chosen
    • Only a Full (= Slow) Initalisation fixes the issue

  5. Perform a Check Consistency on the new created VD

    • If the check consistency completes without errors, the array is now healthy and the puncture is removed

  6. The data can now be restored to the healthy VD

  7. Recommendation: Upgrade all the hard disks firmware to latest version



4. Additional Information

OMSA provides the ability to clear the bad block warnings. To clear bad blocks, the following procedure is recommended:

  • When performing a backup of the virtual disk with the Verify option selected, two scenarios can occur:

    • The backup operation fails on one or more files. In this case, restore the file from a previous backup. After restoring the file, proceed to next step.
    • The backup operation completes without errors. This indicates that there are no bad blocks on the written portion of your virtual disk.
    Note: If you still receive bad block warnings, the bad blocks are in a non-data area.

  • Run Patrol Read (under Virtual Disk Tasks in OMSA) and check the system event log to ensure that no new bad blocks are found. If bad blocks still exist, proceed to next step. If not, the condition is cleared.

    Note: The automated Patrol Read must be deactivated before the option of manually running this action appears in OMSA.

  • To clear these bad blocks, execute the Clear Virtual Disk Bad Blocks task. This can be done in the OMSA GUI or use the cli command:
    omconfig storage vdisk action=clearvdbadblocks controller=id vdisk=id

    Note: To obtain the values for controller ID and virtual disk ID, type omreport storage controller to display the controller IDs, and then type omreport storage vdisk controller=ID to display the IDs for the virtual disks


Need more help?
Find additional PowerEdge and PowerVault articles

Visit and ask for support in our Communities

Create an online support Request


Код статьи: SLN111146

Дата последнего изменения: 06/11/2018 06:21 AM


Оцените эту статью

Точно
Функционально
Просто понять
Помогла ли вам эта статья?
Да Нет
Отправьте нам свое мнение
Комментарии не должны содержать следующие специальные символы: <>()\
К сожалению, наша система обратной связи в настоящее время не работает. Повторите попытку позже.

Благодарим вас за отзыв.