I'd really appreciate some feedback on an out of warranty Equallogic Array we are trying to recover data from.
I have a little experience with Equallogic arrays but at present any clue would be helpful and yes, I know the system is EOL out of warranty and should not contain productive data but that is the current situation and all we're trying to accomplish is to move the data that is still somehow available away and off the group.
To start off a bit of circumstantial info. The group had a complete failure last week and when looking at it, it became clear that there were multiple drive failures on member 2 which caused it to go offline and when checking the CLI it shows it was no longer initializing as the raid was in a failed and unrecoverable state thus prompting the remaining member to also take the group offline due to the missing member.
In the meanwhile we attempted to fit a drive in the array that could replace disk 7 that had previously completely failed and does no longer spin up.
The disk(s) in question were sourced from another EQL array we set back to factory defaults to clear all info and re-initialize the disks.
After replacing disk 7 the raid now shows it is in a degraded state with dirty raid cache and that the newly fitted disk is available as hotspare but no rebuilt has set in. We did try cold reboots, restarts form the CLI and a controller failover but there has been no change.
I may simply be looking at the wrong end of this so if anyone has an idea on how to proceed to get the rebuilt started that would be great. I'm also lookign to find out if there's any chance to recover partial data if the other healthy member is somehow made to set the LUN's or even better their snapshots online again (and yes I am well aware that the missing blocks will make it likely a corrupted mess, but we have a RDM with flat file content that could possibly be all we urgently want back as we have at least partial backups).
Sorry for the wall of text but I thought I should try and be descriptive.
Below the more technical details.
Current status of Equallogic Group
Member 1: PS4000 RAID 5 Healthy
Member 2: PS6000 RAID 6 degraded due to 2 Failed drives (1/13), Disk 7 assigned as hotspare but not rebuilding. Disks with SMART trip alert: 1/4/5/13
Driver Status: Ok
RAID LUN 0 Degraded.
raid status dirty.
15 Drives (0,14,4,6,8,10,12,f,3,5,2,9,15,f,11)
RAID 6 (64KB sectPerSU)
Capacity 7,489,541,636,096 bytes
Available Drives List: 7
1 (history of failure)
13 (history of failure)
CLI(support)> raidtool -Z
Active RAID LUNs: 0
Driver Status = driver running.
Malloc Bytes = 0KB
Outstanding Active I/O's = 0
Pending I/O's = 0
Pending Resource Reqs = 0
Outstanding StripeLocks = 0
Allocated Sectors = 0
Device = 000
status = 002
outio = 00000000
drives = 13
disk luns: 11 15 9 2 5 3 12 10 8 6 4 14 0
disk lun= 0 status=0x00000400 drive active device=0
disk lun= 1 status=0x00020000 history of failure no-device
disk lun= 2 status=0x00000400 drive active device=0
disk lun= 3 status=0x00000400 drive active device=0
disk lun= 4 status=0x00000400 drive active device=0
disk lun= 5 status=0x00000400 drive active device=0
disk lun= 6 status=0x00000400 drive active device=0
disk lun= 7 status=0x00001000 hot spare no-device
disk lun= 8 status=0x00000400 drive active device=0
disk lun= 9 status=0x00000400 drive active device=0
disk lun=10 status=0x00000400 drive active device=0
disk lun=11 status=0x00000400 drive active device=0
disk lun=12 status=0x00000400 drive active device=0
disk lun=13 status=0x00020000 history of failure no-device
disk lun=14 status=0x00000400 drive active device=0
disk lun=15 status=0x00000400 drive active device=0
CLI(support)> exec "raidtool -w 0"
opendisk failed 19 Operation not supported by device
Even though you are OOW you can open a one time support call for a fee. I strongly suggest that you do so. They can determine the best course of action. The lost cache condition will keep the array from starting up.
Running support commands without understanding when and how to use them could result in data loss.
That is correct and actually that was one of our first attempts to get this resolved.
Unfortunately the hardware is older than 7 years and Dell Support will not sell us a out of warranty support ticket.
As stated this exercise is to ensure we recover any data off of the array if possible and we are sure there is a degree of loss or corruption.
Can please you try again? Dell now offers extended H/W only support for up to 12 years from date of sale.
And you only need a senior support techs time to see if the array can be brought back online for recovery.
Dell won't sell you a standard HW and SW support contract after 7 years that is true.
I tried twice (different regions) and both times the answer was "No" with a little caveat, the techs didn't really have EQL experience.
Is there a way to call the US directly and purchase support with a senior tech there?
In a sense i also really need to give the data owners a bit of update in regards to the recovery likelihood.
If I have to go tell them the data is gone or there is a minimal chance of recovery, both is a decision factor to the way forward as we cannot wait until we "find" someone that wants to assist (charges aside).
Where are you located?
I think that's the issue it's not being routed correctly.
Did you try the US support #? 1 (800) 624-9897
I can't give you odds of success. But recovery services have a pretty good average on getting the data off failed drives. Then cloning that to a new drive. That gets the RAIDset back online and a rebuild started.
Another reason to get a case open is they can review the health of the other drives. Odds are there are other drives nearing their service life.
Just to keep you in the loop. I managed to call the number you provided and based on what you said I pushed and did get an answer that yes there is a way to provide a quote but since we aren't in North America the only way to go is to call the EMEA number applicable. For everyone's reference here's a link: https://www.dell.com/support/contents/us/en/04/article/product-support/self-support-knowledgebase/de...
I will go ahead and give that a try again and will be a bit more persistent. It seems Dell is really shy to take money from customers 😉
So after contacting Dell EMEA support they advised that support cannot provide quotes for out of warranty EOL systems and asked us to get in touch with an account manager.
The Agent then put us through to the local support which insisted again Dell cannot sell us any support because the system is older than 7 years.
Seeing that this will go around in circles a little longer it seems I'd like to ask if any of the forums users have a suggestion who to really talk to in the Emerging Markets section to purchase this option Donald was aware of and I was also told is available in the North American market.
Alternatively I am also very happy if you could recommend third party services or give me more info on how to determine if all the effort is futile and i should message the data owners this is beyond repair and be done with it.
Today we're a week after the outage and still have not found anyone to provide us with the opportunity to purchase the remote diags call or help diagnosing this further, so we need to make a call to just drop the idea of asking for assistance or spend more time on chasing this up.
Any feedback, as always, is much appreciated
Sorry for the confusion. The mistake was mine. Currently the fee based support for EQL is US only.
The extended HW support isn't available everywhere at this time.
I did send you an e-mail as well.
Hello Don, no harm done, in a sense this was my part of due diligence and if there is no support offered for the regions at least we know we won't have to follow up on that.
Just to be sure though, we were offered 3rd party support by Dell partners but that said we have been in touch with them and have determined they do not posses the knowledge nor expertise Dell support can offer so this is also a dead end at the moment. I also replied to the e-mail you sent to clarify. Kind regards D