Data Domain: Failed, Absent, Unknown, or Powered-off Disk
Summary: This article describes how to resolve the condition where a disk drive is in a failed, absent, or powered off state.
Symptoms
Purpose:
This article describes how to resolve general, single disk failures on Data Domain (DD) systems.
Symptoms:
Autosupport alerts indicating a drive failure, drive removed (absent) or drive powered-off:
- Event Codes: STORAGE-00001 | STORAGE-00002 | PHYM-00001 | EVT-STORAGE-00002
- Enclosure disk has failed and should be replaced
- Disk has a hardware fault and may need to be replaced
- Disk is absent and should be replaced
- Unable to access the disk and the disk state is powered off
- A disk has failed and the enclosure slot has been disabled
Affected Products:
- All Data Domain systems
- All Data Domain Operating System (DDOS) software releases
Cause
DDOS continuously monitors the health of all disk drives in the system.
DDOS automatically fails a drive when certain read/write I/O errors occur.
If a disk is removed (absent), dead, or powered-off, then the associated storage alert is generated.
Disk slots can be powered-off manually (for example, during tech support troubleshooting) or automatically by DDOS (for example, if a disk is causing backend storage issues).
Resolution
Verify and confirm the failure:
- Verify the alerts on the system using DDMC, Data Domain System Manager (DDSM | Web-UI), or DD-CLI.
- DD-CLI #
alerts show current - DDSM > Alerts
- DD-CLI #
- Identify and confirm the current state of all drives in the system.
- DD-CLI #
disk show state| AND | #disk show hardware - DDSM > Hardware > Storage > Disks
- Single failed disks should be replaced immediately.
- Contact your support provider and raise a service request (SR) if one has not been raised automatically.
- If multiple disk drive alerts exist, contact your contracted support provider for assistance with the next steps.
- DD-CLI #
- Example: DD-CLI:
-
sysadmin@DD# disk show state Enclosure Disk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 --------- ---------------------------------------------- 1 . . . 2 . . . . . . . . . . . . . . s 3 . . . . . . . . . . . . . . s 4 . . . . . . . . . . . . . . s 5 . . . . . F . . . . . . . . R 6 . . . . . . . . . . . . . . s --------- ----------------------------------------------
> Here we see disk 5.6 (F)ailed > (R)econstructing to Spare disk 5.15 - Absent disks are marked as (A)
- Powered-off disks are marked as (P)
-
- For DD3300 disk issues, consider this KB:
Replace the failed disk:
- See relevant documented disk replacement procedures (Data Domain Documentation
- Tip: Best practice for replacing disk drives should include a one minute waiting period from the time a drive is removed until the new drive is inserted.
- For example, remove the failed disk, wait one minute and then insert the new disk.
- This allows the controller firmware to properly remove the old drive before the new drive is inserted, greatly reducing the possibility of a hotplug issue where the new drive is not properly recognized.
- For example, remove the failed disk, wait one minute and then insert the new disk.
- Tip: Best practice for replacing disk drives should include a one minute waiting period from the time a drive is removed until the new drive is inserted.
- Verify that the drive is in SPARE state after replacement.
- From DD-CLI #
disk show state - DDSM > Hardware > Storage > Disks
- Example: DD-CLI:
sysadmin@DD# disk show state Enclosure Disk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 --------- ---------------------------------------------- 1 . . . 2 . . . . . . . . . . . . . . s 3 . . . . . . . . . . . . . . s 4 . . . . . . . . . . . . . . s 5 . . . . . s . . . . . . . . . 6 . . . . . . . . . . . . . . s --------- ----------------------------------------------> The newly installed disk becomes (s)pare and the reconstructed disk is now 'in-use'.
If the new drive shows up as Unknown:
- Attempt a rescan of the one or more drives. Run command #
disk rescan - Verify status of drive from DD-CLI #
disk show state - Clear the alert for unknown drive from DD-CLI #
alerts clear alert-id <alert-id>
If the new drive shows up as failed after a drive replacement:
- Attempt to rescan the drive by rescanning the one or more drives. Run command #
disk rescan - Verify status of drive from DD-CLI #
disk show state - If the drive is still shown as failed (F), run this command from DD-CLI: #
disk unfail <encl.disc>
- For example: #
disk unfail 5.6
Clearing the drive alert after drive replacement:
- Clear the alert for the unknown drive from DD-CLI #
alerts clear alert-id <alert-id>
PowerProtect DD - Replacing Cache Tier Disk on DD6900/DD9400
Duration: 00:02:09 (hh:mm:ss)
Closed captions: None available
PowerProtect DD - Replacing Cache Tier Disk on DD9900
Duration: 00:02:13 (hh:mm:ss)
Closed captions: None available
Reference:
Data Domain - Disk State Description
Data Domain - How to Check Alerts on a Data Domain System
Replacement Procedures: Data Domain Documentation