The program that I began working for a few months ago has purchased a new HP storage solution, but they have to award another contract for some more parts. Once that work is done, we will be decommissioning three EMC AX4 arrays. For now, I have to keep the arrays running because the drive space is supporting the VMware environment.
Here's the current situation:
EMC #1 has active connections to 14 VMware Hosts.
It is listing Disks 5,6,8,9 and 10 as FAULTED, REMOVED or MISPLACED.
Standby Power Supplies A and B are FAULTED.
EMC #3 has active connections to the same 14 VMware Hosts as EMC #1.
EMC #2 has an active connection to only one server but the volume is not in use on the server.
It is listing Disk 7 as REMOVED.
Power/Cooling Module B is FAULTED.
Standby Power Supply A is FAULTED.
Back to EMC #1. Here is the FAULTED Virtual Disk:
|Hot Spare Replacing||Enclosure 0 Disk 8 in Disk Pool 2 - data has been reconstructed to the hot spare|
So here's my proposal: Shut down EMC #2. It is not providing services, and it has good parts. Install the good SPS from #2 into #1. Then I can use 5 of #2's disks to replace the FAULTED/REMOVED drives. If I understand RAID 5, there's no chance that any of the data on Virtual Disk EMC_C is intact, since only 2 of the disks are in NORMAL state. Do I still need to insert the "new" disks one at a time and let the RAID rebuild?
Also, with EMC #1's Disk 10 being listed as MISPLACED, might there be some benefit to swapping it into the REMOVED/FAULTED slots to see if it is recognized and made active again?
That's pretty complicated what you want to do. In all the EMC arrays, the disks that are used to create a Raid Group are then locked into that physical location - so if you used disks 5-10 to create a raid group then each of those disks are formatted with information about the slot number and position within the raid group.
To use a disk it must first be zeroed out. This happens normally when you destroy a raid group - it removes the raid information from all the drives and unlocks them from their current position.
The first question you need to ask is what data can be deleted and what can't. You need to make a backup of the data that can't be lost.
Then you can look at destroying any disk groups you don't care about and removing those groups. That should free up those disks for use.
I think the "misplaced" is a disk that was taken from somewhere and then inserted into a different slot. If that's the case, that disk 10 is probably still holding the information from the original raid group it used to be in.
What you need to do is more complex then this forum is designed to help with. You'll either need to contact EMC support or the 3rd party that originally sold the AX.
I agree with your assessment, but there's no way we can get support. If Contracting could take the steps to get us back under Dell maintenance, then they could also award the contract for our new equipment and I wouldn't be concerning myself with the EMCs at all. However, in your response you answered most of my questions.
EMC #2 holds no valuable information. It has 2 virtual disks in RAID5 configurations. If I destroy both of them, then the data will be zeroed on those 11 good physical disks. At that point, I can pull some of the zeroed disks, use them to replace the REMOVED/FAULTED disks in EMC #1's FAULTED Virtual Disk (EMC_C) and make a new Virtual Disk that isn't FAULTED. After all, a 6-disk RAID5 array with 4 failed members can't regenerate its data anyway, right?
The MISPLACED disk #10 would have originated in the same chassis where it is currently located. Since it only has one faulted disk pool, then it stands to reason its original slot would have been 5, 6, 8 or 9. If so, then it should recognize it's correct location once replaced, correct?
I just discovered today that on the "Attention Required" page, the system actually told me the expected Serial Number for Disk 6. When I pulled the MISPLACED disk from Slot 10, it was the serial that belonged in the empty Slot 6, so I put it there, restoring Disk 6 to NORMAL.
What puzzles me is why I can't find any screen that lists the expected serial numbers for the other slots. I found the disks that were pulled out of the array just sitting on a table, but I don't know which slots they came from. They're probably bad anyway, but I'd like to make sure. Am I missing a menu item in Navisphere Express that will show me the expected serial numbers?
Also, I have destroyed all the disk pools and virtual disks on EMC 2, but when I slipped one of the disks into a slot on EMC 1, it came back with a status of MISPLACED. Did I miss a step that would have zeroed the disks in EMC 2 before I shut it down?
I know no one has replied to my last message, but since my posts are moderated, I thought I'd go ahead and post the latest.
I found another thread in the community where it was discussed that zeroing occurs when creating a new disk pool, so I tried that:
At that point, the screen blanks and takes me back to the Navisphere Express Login page. I've tried restarting the AX4 and that does not change the result.
I'd really like to make this work, but there's virtually no information outside this community, so I really appreciate Glen's input.
What is the Flare version running on the array? Sometimes an older version will not recognize disks based on the OE (Operating Environment) version of the array. The latest version is 02.23.050.5.712. You can check the part number of the disks against which OE version in the below document - look for the AX4 section.
All of our arrays are running FLARE 02.23.050.5.711.
I have compared all of our Part Numbers with the drive and FLARE matrices you linked, and all are listed as fine with our FLARE version and Array Models.
Have you tried to insert one of the disks from #2 in slot 0-0-8 to see what happens? It shows in your first post that a hot spare is replacing 0-0-8 - if you put a disk in that slot it might start to copy the data back from the hot spare to the new disk.
I think the problem is that with that many faulted disks, you may have to re-initialize the array. Since all of the data is already gone, this might be the best action.
I'll see if I can find any documentation about re-initializing the array.