WoodMc

1 Message

2159

December 6th, 2019 15:00

MD3660 Failed Disk Group

We had an incident with a shelf from from our MD3660i when replacing a disk drive. During the replacement seemingly several drives on the shelf went into a missing state. I did not see more than two disks from any of our four Disk groups (RAID 6). We powered down the enclosure, reseated the connectors for the problematic shelf and brought the system back up.

One of the disk groups is showing as Failed, but the member disks are optimal. The recovery guru in the GUI refers us to Technical Support, but it seems unlikely that we will get any for a seven year old system. I noticed the command "revive DiskGroup" in the cli documentation and was wondering if this is the only/next step in trying to recover the disk group and its associated virtual disk.

Thanks,

Responses(2)

D

Daver19

4 Posts

0

December 15th, 2019 21:00

I had a very similar issue with an MD array of similar vintage that somehow 'lost' a couple of drives during a power failure. Getting support from Dell was like pulling teeth once you tell them its out of warranty, so I feel your pain there. Mine went on for several weeks before I resolved it myself. This is what I found that may assist you:

Step 1 - Revive diskGroup command

Details

Generally the safest and quickest action to take is to allow the controllers to attempt automatic recovery
The controllers have an internal list of operations that need to occur to recover access to the data
Below is the structure of the command to be ran
- Not preferred as “diskgroup ID” is not as easily found as the disk group name
- Ex: “revive diskGroup [“DG_0”];”
- revive diskGroup [diskgroup ID];
- or
- revive diskGroup [“name of disk group”];
Note: If your array name or Disk Group name contains a dash (-)
- You may need to rename it as dashes cause issues in recovery

Where to run command

The command can be ran in 2 location depending on version of Modular Disk Storage Manager (MDSM)

MDSM themed Blue/Grey (only used for MD3000(i)

You will need to open a command prompt to the MD install folder with SMCLI

You then need to formulate the smcli command for your enclosure, as seen below
1. Where NameOfEnc is the name of your enclosure
2. If there are numbers or spaces in your enclosure name,
3. 1. smcli –n “NameOfEnc1” –c “revive diskGroup [\“name of disk group\”];”
4. If your FW is old enough, you need to change the command slightly
5. 1. smcli –n NameOfEnc –c “revive diskGroup [\“name of disk group\”];”
6. If there is a password (repeat as necessary),
7. 1. smcli –n NameOfEnc –p “Password”–c “revive diskGroup [\“name of disk group\”];”
1. Smcli –n NameOfEnc –c “revive diskGroup [“name of disk group”];”
Type the command into the command prompt and then press enter to execute

MDSM themed Black/White (released when MD32/36 came out, but works on MD3000(i) as well)

You will need to go to the main window of MDSM with tabs “Devices/Setup”
Locate the name of your enclosure and right click on it
Choose “Execute Script…”

Insert the command into the top box
1. Ex: revive diskGroup [“DG_0”];
On the file bar, click Tools -> Verify and Execute

What’s next?

Depending on FW level, this can resolve the issue
- Allow the system to reconstruct
- If possible, plan for updates immediately to alleviate reoccurrences and other issues
- Continue to Step 2
- If the issue is resolved,
- If issue is not resolved,

Step 2 - Reboot head unit

Pre-Requisites

This option requires you to have a minimum FW level of 08.xx.xx.xx
If you are not at this FW level
- Please skip to Step 3
- Consider engaging an analyst

Details

Starting with 08.xx.xx.xx, the FW has advanced recovery options built in
The system will attempt to recover all RAID volumes automatically on reboot
To properly attempt operation:

Make sure that all expansion enclosures are up and all HDDs are optimal
Reboot the head unit manually
1. Power off the unit with both power switches at the same time
2. Wait 30 sec-1 min (or unit lights on controller go off)
3. Turn on both power supplies at the same time
4. Wait for 2-5 minutes until the system comes back up

I ended up having to actually purchase the special (of course it was) Dell Console cable (to which they sent me the wrong one despite me finding the right one the first time) and had to SSH in and run a couple of other commands that managed to revive the disks (thank god!0. Nightmare of an experience to be honest. Sad part it - its the 4th time its happened after a power outage (in 7 years) and the first time AFTER extended warranty ran out and Dell won't allow anymore extended warranty. Same damned issue every time!

D

dharmie

22 Posts

0

July 7th, 2022 12:00

hey Bro,

see this screenshot i have from the command you advised above, it is the verified feedback from the execute script page on the management app. for MD3200.

the diskgroup stated is my failed Database and it is critical.

must i restart the storage in the other you have advised even if the VD is fixed?

View All

No Events found!

PowerVault

MD3660 Failed Disk Group