MD Series: Md3xxx How To replace a disk in predictive or impending failure
Summary: This article covers how to safely replace a predictive failure drive in a MD3xxx storage using Modular Disk Storage Management (MDSM).
Instructions
Introduction
This tutorial explains how to replace a disk in predictive or impending failure. A predictive failure is a feature of modern Hard Disc Drives (hard drive) designed to improve RAID reliability. A predictive failure indicates that a hard drive must be replaced before failure occurs.
Cause
During normal read/write operations, an error may occasionally occur on a hard drive. The controller identifies this error and repairs it. These errors are also known as "Bad Blocks". This is why the memory space on a hard drive is usually, slightly larger than specified. This space is used to relocate or repair any bad blocks that occur during normal operations. A predetermined threshold of bad blocks is assigned to an individual hard drive. When this threshold is reached, the controller changes the status of the hard drive to "Predictive Fail". The hard drive remains operational, however, the probability that the hard drive will fail soon.
It is recommended that a hard drive in "Predictive Fail" status should be replaced promptly to maintain the integrity of the RAID volume. To replace the hard drive, it can be removed safely from the RAID volume before physical replacement. Follow the process outlined below to change the hard drive status to offline and safely remove it from the RAID volume.
Solution
Note : Before proceeding, Modular Disc Storage Manager (MDSM) must be installed. MDSM can be downloaded from the support site. The system must have access to the storage array.
Follow the below process to offline and safely remove the hard drive from the RAID volume.
- Launch MDSM and select the corresponding PowerVault array.
- Verify the correct array using the status of the enclosure
- If there are no issues with an array, it shows as "Optimal", as seen in Figure 1 below

- Figure 1: MDSM Devices view showing Optimal state
- If the array has a hard drive in predictive failure, the Status changes to "Need attention"
- Double-click the array to access to the array manager
- Verify that there are no other missing or failed drives in the same RAID set
- Click Hardware, and then select the hard drive in predictive fail. The status shows as "Need attention"

- Figure 2: Hardware section of MDSM
- Right-click the hard drive and select Advanced, then Fail

- Figure 3: Right-click menu showing the Fail option
- Acknowledge the drive failure operation by typing "Yes"
- If there is a spare disk in the array, also known as a "Hot Spare", leave the box "Copy contents of hard drive before failing" checked
- The data of the predictive failure disk is copied to the Hot Spare, to avoid any degradation of a RAID
- This is shown below in Figure 4
- If there is no Hot Spare, clear the "Copy contents of physical disk before failing" box
- Do not attempt to copy contents unless there is a Hot Spare available in the array
- Attempting this may cause data loss or corruption

- Figure 4: Confirm Fail Physical Disk dialogue
- If the option to copy the contents is used,
- It may take some time before the drive fails and the state changes to "Failed"
- The status of the hard drive changes to "Failed" and has a red "X" next to it
- It is now safe to physically replace the hard drive