Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

5755

July 29th, 2016 10:00

PS5000e member offline - status light indicator light?

We have an issue with our PS5000e SAN.  It's not a mission-critical device and we are in the process of moving off of it, but we need to get it back online if possible.

This morning the member went completely offline.  I can connect via web UI, but it just says member offline.  Connecting to CLI indicates the same.  All commands in CLI are executing extremely slowly, and occasionally I get warnings like:   "Logger daemon is losing messages because offline disks are generating more events than the daemon can handle."

Unit is in a datacenter 500 miles away, so I had remote hands send me pictures of it.  One drive shows amber, remaining 15 drives all have green lights, so the failed drive shouldn't have caused it to go down.

show recentevents returns nothing.

On the back of the unit, everything looks fine except for one amber next to a label with a wavy line (possible power issue?).  Picture below:

Before I get too much further with taking action I'd like to know what this light means if possible.

Thanks for any help!

5 Practitioner

 • 

274.2K Posts

August 31st, 2016 07:00

Hello Teja, 

 In the future it is preferable to create a new topic vs. change an existing one. 

 From your description, it sounds like the two arrays are in different pools.  Is that correct?   Are any volumes offline for example? 

 If not, then it sounds like each is alone in its own pool.  In that case you can delete the failed member without impacting member2.   Member1 will be reset back to factory defaults. 

 You are running very old firmware, and I suspect possibly old drive firmware as well. 

 It's very important to keep up on both.  Some of the drive firmware updates prevented premature failure.  

 Regards, 

Don 

8 Posts

July 29th, 2016 12:00

Thank you Don for the reply.  I've generated diag files that I can send to Dell if they'll open a ticket for me since this is obviously beyond support.

Based on what I've read, I tend to agree with you, it's my understanding that if a member goes offline due to a drive failure, it's toast.

And then confirmation...

raidtool confirms RAID has faulted beyond recovery

thanks for the raidtool suggestion - I actually had to run:

GrpName> su exec 'raidtool'   (for the benefit of future readers of this post) 

5 Practitioner

 • 

274.2K Posts

July 29th, 2016 12:00

Hello, 

 Based on the description, the RAID set is faulted beyond recovery.  The message about the logger daemon means there's no place to store new events. 

 I would suggest you attach to the serial port of that member, and open a support case with Dell.  It will be a no warranty ticket, but Dell will look into it, and see what can be done.   You might end up having to send out one or more drives to a recovery vendor. 

 At the GroupCLI, run:    raidtool

GrpName>raidtool. 

  Do you know what version of Firmware it is running? 

Regards, 

Don 

5 Practitioner

 • 

274.2K Posts

July 29th, 2016 13:00

Hello, 

 You are most welcome. 

 Re: diag files.  You won't be too successful with that as the raid is down.  It stores the results (or tries to) on a small slice of the RAIDset.  It will likely appear to hang a different points. 

 When you open the case they'll run a few commands manually, or if the firmware is new enough, the "mini" diags.  

Re: RAIDtool.  Sorry, I should have sent that along as well. 

 I've been with Dell for 10+ years, they're pretty generous in trying to help customers with down systems when out of warranty.  No parts obviously.  :) 

  I won't say it can be recovered, but we've had some pretty good luck over the years.  There are some Dell partners in the drive recovery field that have made some steady improvements over the year.  I recall one quoting me that over 75% of the EQL cases they handled were recovered.   They have custom drive firmware to prevent a POST fail from shutting down the drive for example.  All kind of amazing tech and tricks to get data back and cloned onto a new drive. 

 It just depends on how important that data is.  You don't have to send the entire array to the recovery sight either.  Dell support should identify the critical, must recover drive and reviewing the health of the remaining ones might suggest cloning others as a precaution.  

 Also, please, do not, power down or reboot the member, unless instructed by support.   We want to preserve the stats that are in memory right now.  A restart will clear all that out. 

 Regards, 

Don 

 

 

5 Practitioner

 • 

274.2K Posts

July 29th, 2016 13:00

You are so very welcome!  Glad I could help out.  

Be curious to know how you end up. 

 Don 

8 Posts

July 29th, 2016 13:00

Excellent information Don, THANK YOU!   I had considered doing a restart earlier but was of the same mindset as you - preserving anything we can in memory.  Plus who knows what kind of havoc shutting the thing down might wreak.

The data isn't nearly as important as the time and level of effort to reconfigure.

Going to give Dell a call now, you've been super helpful and I really appreciate your time.

8 Posts

July 29th, 2016 20:00

Don,

I worked with Dell for a while and we were close but no cigar.  Actually got it to come back online briefly but it soon shutdown again.  The drives are just not healthy enough.  Looked into data recovery and I talked to the fine folks at DriveSavers.  Sounds like they have a good service there but very expensive...but it would be well worth it to recover some seriously important data.  Probably not going to fly in our case b/c it's just going to save us some time and effort...no real data loss.

I will say this though, I was absolutely shocked at how helpful Dell support was with this.  I didn't think they would even speak with me given the age of the unit and obviously no support, but at no point was there a single hesitation in giving me help.  We even did two calls as the SAN is 500 miles from here and I had to coordinate remote hands.  Just outstanding service.  Try getting that level of support from Cisco without a contract, ha!

Thanks again, 

Eric

5 Practitioner

 • 

274.2K Posts

August 1st, 2016 10:00

Hello Eric, 

 You are very welcome. 

 I'm sorry to hear they couldn't get it back online long enough to recover the data.  Yeah, DriveSavers are one of Dell's partners.  I've worked with them in the past. 

  Re: Dell.  Well, I'm so happy you were pleased with your support experience.  I've been with EQL since 2004.  As the new kid on the block EQL would do the same thing.  Help customers in need.  When Dell purchased EQL I was very happy to learn Dell believed the same thing.  Yours is not an isolated case. Obviously, we hope such good will will get people to stay Dell customers or come back.

 I'm a support not sales guy, I like it when we can get customers back online or get their data recovered.  It's a great feeling. 

  Thank you for the reply, great way for me to start the week. 

  Regards,

Don 

14 Posts

August 31st, 2016 04:00

Hello All,

I am new to Dell Equallogic SAN. some how i am managing PS6500 SAN box remotely from group IP. We have two members member1 having 20 TB and member2 having 104 TB capacity.

From past two month we are seeing issues in member1 as disk are failing one by one. we noticed and moved all data to member2 last week. Yesterday Member1 was down. current status showing offline state and raid is fault as per event logs..  

So if we decommissioned member1 alone is its effect to any thing in member 2 ?  some one can explain this please... 

Storage Array firmware on both members - V5.2.4 (R255063)(H1)

 Thanks,

Teja

14 Posts

August 31st, 2016 07:00

Unfortunately no Don. support ends long back as 09-07-2014 member1  and meber2 05-31-2015. 

I am looking for fimrware to download and test in member1 then proceed for member2. can you help on that where i can download  5.2.x->6.0.11 ?

5 Practitioner

 • 

274.2K Posts

August 31st, 2016 07:00

No unfortunately I can't.   It's the service contract that maintains the license for all the software/firmware and access to the download site. 

 You can contact your reseller to see if you can get a new contract.  

 Regards, 

Don

5 Practitioner

 • 

274.2K Posts

August 31st, 2016 07:00

No problem.  It also prevents topics from expanding to many pages. 

 Yes, in the GUI or at the CLI, you can delete the member.   That offline volume will be gone of course.  If that offline volume isn't critical of course. 

 re: Firmware. Yes.  5.2.x->6.0.11->7.1.x->8.1.x->9.0 if you want to be at the latest.  When you get to at least 6.0.x you can try the drive firmware update utility to see if any drives are due for an update.  Then continue on with the array firmware update. 

 Do you still have a support contract on these arrays?   You'll need that in order to download the firmware. 

 Regards,

Don

 

14 Posts

August 31st, 2016 07:00

i am apologies for this as i was mentioned in ongoing conversation, As i thought its related to some what my issue..

yes both are different pools. only one empty volume is offline which is coming from member1. so as per your update i will delete the member1 and upgrade the firmware for rest of member.

Regards,

Teja

No Events found!

Top