Start a Conversation

Unsolved

This post is more than 5 years old

16858

November 1st, 2012 17:00

PSXV4000 Array Prolonged Power Loss

All -

My Data Center in lower Manhattan has been without power since Monday night ( 8:15pm EST). As you can imagine, most of my data sits on this SAN and want to know from the collective, what should I expect when the array boots up for the first time since Monday night ( potentially 4-5 days after shutting down) ?

5 Practitioner

 • 

274.2K Posts

November 1st, 2012 18:00

Did you have a UPS?   If the array was up long enough to flush its cache after the servers were down, then you maybe just fine.  Have the serial cable connected up when you first power it on and capture the output just in case you need to open a support case.  

November 1st, 2012 18:00

I did have it connected to a UPS, but no connection between the EQL and the UPS to enable graceful shutdown. Once the UPS drained, i imagine everything just shut down abruptly.

November 1st, 2012 18:00

Also, since the entire lower half of manhattan is powerless, the power utility company hasn't given us an exact time when power will be restored, so being on-site when power does come online probably won't be an option.

5 Practitioner

 • 

274.2K Posts

November 1st, 2012 18:00

Re: Shutdown  I understand.   If the servers get shutdown before the array, there's a nice window before the UPS dies where the array can flush cache and with no new pending writes, it comes right back up.

Well, when you do get there, have your serial cable ready and connect up.  

I'll keep all you folks in my thoughts.  Storm of a lifetime.  

Stay safe.

203 Posts

November 1st, 2012 19:00

A few years ago I went through an extended power loss scenario as well, in which the array went down in an ungraceful way.  I did not have any issues as the result of that, but did realize I had some procedural gaps in my full power-up procedures.

November 1st, 2012 21:00

What have you since implemented for your power-up scenario? This being new to me, any anectodotal information is truly welcome. I currently have a 8kvA APC Symmetra LX UPS as an FYI Thanks for the words and help dwilliams.

203 Posts

November 1st, 2012 22:00

Well, since each environment is a little bit different, it will be hard to give specifics, but I’d first suggest to use this opportunity to capture items that you will know need to be included in the process, then you can refine the details later.  Do a braindump of all everything you were uncertain of during the power up, etc..  The problem with power up and power down situations is that you really can’t practice the situations.  So capture what went well and what didn’t go well, so that you can build up a good runbook from there.  I use MS OneNote for all of my documentation needs like that.

Assuming you have a nearly fully virtualized environment, you’ll want to note what order to power some of the earliest items (SAN arrays, and SAN switchgear), then onto the LAN switchgear.  Then make sure that you have perhaps a separate standalone DNS server that all of your infrastructure equipment (Arrays, switchgear, ESXi hosts, etc.) can reference as a secondary DNS server.  Then move onto how you will power on the host with your AD servers, and the host that has your vCenter VM (assuming it’s a VM, etc.).  That is typically the biggest chicken/egg situation people have.  Your runbook should include how long you might want to wait to let things spin up before proceeding.  You can avoid a lot of problems by not getting too anxious and powering up things before they are ready to be powered up.

It’s a bit off topic I know, so if you’d really like to continue the discussion offline, you may DM me.  Good luck to you.  

4 Operator

 • 

9.3K Posts

November 2nd, 2012 07:00

This applies only really to the newer Equallogic units, but the 4100 and 6100 models no longer depend on a battery (typically with about a 3-day charge) to keep write cache through a poweroutage.

The new units allow the controller enough time to copy the write cache to non-volatile memory where it can reside indefinitely till power is restored. I've been told this also isn't a regular battery, but a capacitor (much longer lifespan and therefor typically no need to replace it within the economic life of the system).

Obviously this only applies to the new units and it'd be hard to justify buying new equipment for just this feature.

Source: www.dell.com/.../product-compare (under "memory size").

5 Practitioner

 • 

274.2K Posts

November 2nd, 2012 07:00

You are most welcome.    Sketchy is correct is how he's laying it out.   The order of shutdown and power up is most important.  So if your servers have a on UPS run time of 30 mins lets say, shut them down at 15-20.  Give your storage (from any vendor) time to shutdown.   And setting it to power on first, before servers is critical too.  The APC power strips (and others I'm sure) can delay power on selectively.   Giving the SAN (switches and arrays) time to boot before servers.

When 5.x firmware was released, Dell included support for Python scripting.   With a Python script its possible to send the shutdown command to EQL arrays.  While most EQL arrays won't actually shutdown the power supply, it will flush cache and shutdown the array OS cleanly.  

There's a sample Python script that does a member restart.   That one could be modified to do a shutdown instead.  Then it could be called from the APC control software.   you can find the scripts under the Firmware Downloads labeled "Host Scripting Tools"   There's one for Linux and Windows.

Regards,

November 2nd, 2012 13:00

Good thing I have a 6100XVS hybrid array NIB in the Data Center waiting to get setup! 

Since the power loss happened at a time when the network wasn't too busy and I'm hoping the UPS kept the SAN up a bit longer than the servers who would write to the SAN much, I'm hopeful that all data should be available and the SAN should boot up normally. I'm certainly crossing my fingers.

No Events found!

Top