Highlighted
2 Bronze

Need help keeping our AX4-5 units up and running just a bit longer...

The program that I began working for a few months ago has purchased a new HP storage solution, but they have to award another contract for some more parts.  Once that work is done, we will be decommissioning three EMC AX4 arrays.  For now, I have to keep the arrays running because the drive space is supporting the VMware environment.

Here's the current situation:

EMC #1 has active connections to 14 VMware Hosts.

It is listing Disks 5,6,8,9 and 10 as FAULTED, REMOVED or MISPLACED.

Standby Power Supplies A and B are FAULTED.

EMC #3 has active connections to the same 14 VMware Hosts as EMC #1.


EMC #2 has an active connection to only one server but the volume is not in use on the server.

It is listing Disk 7 as REMOVED.
Power/Cooling Module B is FAULTED.

Standby Power Supply A is FAULTED.

Back to EMC #1.  Here is the FAULTED Virtual Disk:

Virtual Disks:
NameCapacityState
EMC_C2 TBFaulted,Offline
Disks:
DiskCapacityState
Enclosure 0 Disk 4917 GBNormal
Enclosure 0 Disk 50.000 GBRemoved
Enclosure 0 Disk 60.000 GBRemoved
Enclosure 0 Disk 7917 GBNormal
Enclosure 0 Disk 80.000 GBFaulted
Enclosure 0 Disk 90.000 GBRemoved
Hot Spare ReplacingEnclosure 0 Disk 8 in Disk Pool 2 - data has been reconstructed to the hot spare

So here's my proposal: Shut down EMC #2. It is not providing services, and it has good parts. Install the good SPS from #2 into #1. Then I can use 5 of #2's disks to replace the FAULTED/REMOVED drives.  If I understand RAID 5, there's no chance that any of the data on Virtual Disk EMC_C is intact, since only 2 of the disks are in NORMAL state. Do I still need to insert the "new" disks one at a time and let the RAID rebuild?

Also, with EMC #1's Disk 10 being listed as MISPLACED, might there be some benefit to swapping it into the REMOVED/FAULTED slots to see if it is recognized and made active again?

Thanks,
Steve

Labels (1)
0 Kudos
16 Replies
Highlighted
5 Rhenium

Re: Need help keeping our AX4-5 units up and running just a bit longer...

That's pretty complicated what you want to do. In all the EMC arrays, the disks that are used to create a Raid Group are then locked into that physical location - so if you used disks 5-10 to create a raid group then each of those disks are formatted with information about the slot number and position within the raid group.

To use a disk it must first be zeroed out. This happens normally when you destroy a raid group - it removes the raid information from all the drives and unlocks them from their current position.

The first question you need to ask is what data can be deleted and what can't. You need to make a backup of the data that can't be lost.

Then you can look at destroying any disk groups you don't care about and removing those groups. That should free up those disks for use.

I think the "misplaced" is a disk that was taken from somewhere and then inserted into a different slot. If that's the case, that disk 10 is probably still holding the information from the original raid group it used to be in.

What you need to do is more complex then this forum is designed to help with. You'll either need to contact EMC support or the 3rd party that originally sold the AX.

glen

Highlighted
2 Bronze

Re: Need help keeping our AX4-5 units up and running just a bit longer...

I agree with your assessment, but there's no way we can get support.  If Contracting could take the steps to get us back under Dell maintenance, then they could also award the contract for our new equipment and I wouldn't be concerning myself with the EMCs at all.  However, in your response you answered most of my questions.

EMC #2 holds no valuable information. It has 2 virtual disks in RAID5 configurations. If I destroy both of them, then the data will be zeroed on those 11 good physical disks.  At that point, I can pull some of the zeroed disks, use them to replace the REMOVED/FAULTED disks in EMC #1's FAULTED Virtual Disk (EMC_C) and make a new Virtual Disk that isn't FAULTED.  After all, a 6-disk RAID5 array with 4 failed members can't regenerate its data anyway, right?

The MISPLACED disk #10 would have originated in the same chassis where it is currently located. Since it only has one faulted disk pool, then it stands to reason its original slot would have been 5, 6, 8 or 9. If so, then it should recognize it's correct location once replaced, correct?

Thanks,

Steve

0 Kudos
Highlighted
5 Rhenium

Re: Need help keeping our AX4-5 units up and running just a bit longer...

Correct - the drive when inserted in the correct slot will indicate that

glen

0 Kudos
Highlighted
2 Bronze

Re: Need help keeping our AX4-5 units up and running just a bit longer...

I just discovered today that on the "Attention Required" page, the system actually told me the expected Serial Number for Disk 6.  When I pulled the MISPLACED disk from Slot 10, it was the serial that belonged in the empty Slot 6, so I put it there, restoring Disk 6 to NORMAL.

What puzzles me is why I can't find any screen that lists the expected serial numbers for the other slots.  I found the disks that were pulled out of the array just sitting on a table, but I don't know which slots they came from. They're probably bad anyway, but I'd like to make sure.  Am I missing a menu item in Navisphere Express that will show me the expected serial numbers?

Also, I have destroyed all the disk pools and virtual disks on EMC 2, but when I slipped one of the disks into a slot on EMC 1, it came back with a status of MISPLACED.  Did I miss a step that would have zeroed the disks in EMC 2 before I shut it down?

Thanks,

Steve

0 Kudos
Highlighted
2 Bronze

Re: Need help keeping our AX4-5 units up and running just a bit longer...

I know no one has replied to my last message, but since my posts are moderated, I thought I'd go ahead and post the latest.

I found another thread in the community where it was discussed that zeroing occurs when creating a new disk pool, so I tried that:

  • Click "Disk Pools"
  • Click "Create Disk Pools"
  • Select "RAID 1/0"
  • Select an even number of disks (I've tried several combinations, but say Disk 5 and 6 for example)
  • Click "Apply"

At that point, the screen blanks and takes me back to the Navisphere Express Login page.  I've tried restarting the AX4 and that does not change the result.

I'd really like to make this work, but there's virtually no information outside this community, so I really appreciate Glen's input.

Steve

0 Kudos
Highlighted
5 Rhenium

Re: Need help keeping our AX4-5 units up and running just a bit longer...

What is the Flare version running on the array? Sometimes an older version will not recognize disks based on the OE (Operating Environment) version of the array. The latest version is 02.23.050.5.712. You can check the part number of the disks against which OE version in the below document - look for the AX4 section.

https://support.emc.com/docu42949_All_VNX_CLARiiON_Celerra_Storage_Systems_Drive_and_FLARE_OE_Matric...

glen


0 Kudos
Highlighted
2 Bronze

Re: Need help keeping our AX4-5 units up and running just a bit longer...

All of our arrays are running FLARE 02.23.050.5.711.

I have compared all of our Part Numbers with the drive and FLARE matrices you linked, and all are listed as fine with our FLARE version and Array Models.

EMC #101234567891011
Serial Number:Z1N4R08YZ1N388GTZ1N4RL9EP6GH6BVPZ1N4RDMGRemoved9QJ7XPYGZ1N4KXXTRemovedRemovedRemovedN/A
Part Number:50500635050063505006350506695050063Removed50488315050063RemovedRemovedRemoved5050063
Vendor Name:ATA-STATA-STATA-STATA-HTCHATA-STRemovedATA-STATA-STRemovedRemovedRemovedATA-ST
EMC #201234567891011
Serial Number:Z1N26P0HPBJ51XPE9QJ7YQRCZ1N350ZDPAKST4BE9QJ7XZLM9QJ7YQG6Z1N37XZW9QJ7ZK8L9QJ7Z71YZ1N3XE4JRemoved
Part Number:50500635048805504883150500635048805504883150488315050063504883150488315050063Removed
Vendor Name:ATA-STATA-HTCHATA-STATA-STATA-HTCHATA-STATA-STATA-STATA-STATA-STATA-STRemoved
0 Kudos
Highlighted
5 Rhenium

Re: Need help keeping our AX4-5 units up and running just a bit longer...

Have you tried to insert one of the disks from #2 in slot 0-0-8 to see what happens? It shows in your first post that a hot spare is replacing 0-0-8 - if you put a disk in that slot it might start to copy the data back from the hot spare to the new disk.

I think the problem is that with that many faulted disks, you may have to re-initialize the array. Since all of the data is already gone, this might be the best action.

I'll see if I can find any documentation about re-initializing the array.

g;en

Highlighted
5 Rhenium

Re: Need help keeping our AX4-5 units up and running just a bit longer...

Also, you might want to look at this site - go to the Legacy EMC Product Documentation and then to AX4-5 section

https://mydocuments.emc.com/#

glen

0 Kudos