Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

3403

November 16th, 2015 19:00

Plz help drop faulted pool

I have a pool which I have been trying to drop for almost 12 hours.

The pool reports that it's been dropping for almost 12 hours, but no progress.

Can I somehow force the drop to complete?

195 Posts

November 17th, 2015 11:00

First off:  You have my sympathy for still using 1TB SATA disks.

Other than that here are a few additional thoughts:

>  You mentioned pulling and reseating both of the drives; I don't know what you did there, but I would have suggested only doing one at a time, and waiting a while in between.  But honestly, even if the two were from the same 6+2R6 group, that shouldn't have been a huge problem.

> Some SATA failures involve flooding the bus with errors.  In cases where performance seems to drop off to a trickle I look at the events for both SPs to insure that one of them isn't getting hammered with error messages.  If that is the case, pull the offending drive to quiet the bus back down.

> In your pool, while you may have deleted all the user LUNs, have you looked for private LUNs that may still exist there?  If a LUN was failed, you may find that a private or unowned LUN still exists.

> You could definitely try rebooting one SP at a time, during a quiet off-peak time.  I don't know if it would help...but it *might* shake something loose.

>  As a last resort, I might consider unseating *all* the disks from that pool, then rebooting the SPs (again one at a time).  I do something similar to this when I want to remove a DAE from a running system; at first it thinks everything is broken, but after the reboots it accepts the loss and moves on with its life.  I'm sure this is not a recommended or approved procedure, but desperate times call for desperate measures...  If that cleaned things up I would start putting the disks back in, and see how that goes.

Best of luck.

195 Posts

November 17th, 2015 07:00

Reading your other post it sounds like the array is not healthy, and that the pool is in a failed state.

There may be no safe and non-disruptive way forward.  How much disruption can you tolerate?

Also, you mentioned that the unit was completely off of maintenance.  I would not mention this otherwise, but you know...you can get parts for CX and even VNX units relatively inexpensively.  If you value your data and wish to continue using this unit you should look into getting parts and even spares from somewhere.  A few hundred, or even a few thousand, dollars might be cheap compared to the value of your data.

4.5K Posts

November 17th, 2015 08:00

A caution about parts - the drives used in the Clariion/VNX have a special format/firmware that makes it difficult to get replacement disks that work in a clariiion. Next you would need to verify that the disks have the firmware version that is supported by the the Flare version running on the array. The OE Disk and Flare Matrix available on the support.emc.com support site has the latest disk firmware to Flare for each array - you should check that before purchasing any drives.

https://support.emc.com/docu32251_CX4_Storage_Systems_Disk_and_FLARE_OE_Matrix.pdf?language=en_US

glen

2 Intern

 • 

222 Posts

November 17th, 2015 10:00

The pool was defined as RAID6, consisting of 40x 1Tb SATA drives.

2 drives faulted at almost the same time, the drives were close together physically.

I pulled and re-seated the drives, and the fault went away, but I was not able to delete the pool.

The pool has been in a "deleting" state for over 24 hours now, and doesn't have any luns defined. All luns have been moved to other pools.

Im wondering if a "reboot" of the SP's would clear the problem?

I can reboot the SP's sequentially, separated by about 1/2 hour to give the array time to failover and failback.

2 Intern

 • 

222 Posts

November 17th, 2015 10:00

Yes, thanks for that. We have been buying disks from 3rd party vendors and have not had any issues.

This is the first time I have had 2 faulted drives at the same time, which I think contributed to the problem.

I emptied out the pool, so it's 100% free at this time.

I initiated the drop, and it seemed to be going for  a bit, but then seems to have hung.

Currently, there are  no amber lights.

However, naviseccli does report:

Pool 2 is faulted.

Looking for some way to get rid of the pool, so I can re-create the storage as smaller raid groups.

2 Intern

 • 

222 Posts

November 17th, 2015 19:00

No private luns, 100% empty.

Tried rebooting the SPs but didnt pull any drives. That didn't work, so may try pulling the drives and then rebooting tomorrow.

4.5K Posts

November 18th, 2015 11:00

FYI - a Pool is build with disks that are arranged as private raid groups and those private raid groups all contain a number of private LUNs - when you create a user LUN, it takes slices from the private LUNs to build the user LUN. In a Raid 6 using 6+2, there will be 16 private LUNs in that raid group. To destroy the Pool all the private LUNs must be unbound (deleted). If you had two disks fail in the array it's possible that the two were in the same private raid group. R6 should protect you in the case of two disks failing. You must be careful about pulling out the two disks in a Raid 6 - you should allow the bad disks to rebuild to the hot spares. Did you have two hot spares in the array that the private LUNs could rebuild to?

I would not recommend rebooting SP's until you determine what the actual errors is first. See my comment in your other thread.

glen

2 Intern

 • 

222 Posts

November 18th, 2015 19:00

Hmm, allI know is that the pool didn't show anything under "private" luns and the free size = total size, which tells me it's 100% free. Anyhow, the reboot did the trick.

I bounced SPa, then SPb, about 1/2 hour after.

Still took a while, but the pool disappeared.

Ive since then carved the same disks into 2 pools of 1/2 the size, still using Raid6 because im paranoid about having a disk fail during a rebuild from the first disk failing...

No Events found!

Top