Nick_London
1 Copper

Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Hi

I am new to the storage world so please excuse me if i'm asking simple questions.

I have a failed drive on a CX4 and received an email so a call has been raised with EMC to swap it out.

How can I tell if the Hot spares have now taken its place so my RAID Group is still protected?

Also I see there is an option in unisphere for replace disk, do I need to do this when the disk arrives or can I just pull it and swap it with the new one, currently the slot with the failed drive is showing as removed so the drive is dead.

Lastly do I need to do anything in unisphere to activate the new drive or is it all automatic.

Thanks

Nick

Labels (1)
Tags (2)
13 Replies
carric4
1 Copper

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Hello, in Unisphere, you will see the failed drive say "transitioning or rebuilding". Not sure the exact verbiage with newest code. Wait until it states "removed" with no other indicator. Once that's complete the HS has taken over and joined the RG. When you get the new drive, just pull out old one "Amber light" and push new one in. Then, automatically Unisphere will tell you it's rebuilding back. Once complete, it will not have any "T" or any indicator on that failed drive and the new drive will then be back to RG as you were prior to failure. My terminology may be slightly off due to new Unisphere wording but just make sure the drive ONLY says removed with no other activity or indication that it's copying to HS and you can then just pull and push new one in.

0 Kudos
kelleg
4 Ruthenium

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Was your question answered correctly? If so, please remember to mark your question Answered when you get the correct answer and award points to the person providing the answer. This helps others searching for a similar issue.

glen

0 Kudos
ThatsLUNny
1 Nickel

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

You can run a navicli -h spa getdisk 0_0_5 (if 0_0_5 is your hot spare) and it should tell you if it is rebuilding or in place/active of the failed disk.

kelleg
4 Ruthenium

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Was your question answered correctly? If so, please remember to mark your question Answered when you get the correct answer and award points to the person providing the answer. This helps others searching for a similar issue.

glen

0 Kudos
zhouzengchao
3 Cadmium

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

There is one KB article for your reference: emc250611. I copied out the main steps here:

To check whether a hot spare is actively replacing a failed disk from the Navisphere Manager:

  1. Navigate to the LUN folders.
  2. Go to the Unowned LUNS folder and expand it by clicking on the plus symbol.
  3. Select a hot spare and right-click it.
  4. Go to the properties of the hot spare.
  5. Go to disk tab and check the status of the hot spare.

If the hot spare is replacing the failed disk, the status will be displayed as Active.

Alternatively, select the disk under the hot spare and right click and select the properties. If the hot spare is invoked, it will display as Engaged under the current state and under the hot spare replacing the status will be displayed as Active. 

For command line check, you may issue getdisk -hs. For example, my 1_0_8 is down, to check if HS was involved:

getdisk -hs

Bus 1 Enclosure 0  Disk 6

Hot Spare:               24567: YES

Hot Spare Replacing:     1_0_8

Bus 1 Enclosure 0  Disk 7

Hot Spare:               NO

Bus 1 Enclosure 0  Disk 8

State:                   Removed


As you can see, the removed drive 1_0_8 is now replaced by HS 1_0_6

bayya1
1 Nickel

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Hi Nick,

As with the other features the new VNXe is now on parity with the VNX for RAID and resiliency  The first enhancement is Permanent Sparing.  Traditionally when a drive in a RAID set failed the array would grab a designated hot spare and use that to rebuild the RAID set.  When you replaced the failed drive the array would copy the data from the hot spare to the new drive, and then make the hot spare a hot spare yet again.  That’s no longer the case.  Now the array keeps using the hot spare drive.  Big deal?  I don’t think so.  Just be aware.

How hot spares are specified has also changed..and by changed I mean gone away.  You no longer specify drives as hot spares.  Any unbound drive is capable of being a hot spare.  The array is smart in how it chooses which drive to use (capacity, rotation, bus, etc) so that it doesn’t pick an odd drive on a different bus, unless it has to do so.

MCx also has a timeout for RAID rebuilds.  If a drive goes offline, or fails, or you pull it out for some reason the array now waits 5 minutes before activating a spare and rebuilding the set.  It does this to make sure you didn’t accidentally do something or that you’re not moving drives around.

You can now pull drives from a slot and put them in another slot and the array will detect it and put it back online without activating a rebuild..as long as you do it within 5 minutes.  You can also shut down the array and re-cable the backend buses if you want and it will still know which drives belong to what.  Let’s be clear here.  Don’t just do this without planning.  You’re still moving drives and changing things.  Do it for a purpose.  Also, you can’t move drives, or whole RAID groups, between arrays…even between MCx arrays.  It’s only within the same array.  Use caution.

MCx does parallel rebuilds on RAID6, if you lose two drives.  FLARE would rebuild the set with one drive…then rebuild it again for the second drive.  MCx is more intelligent and if you fail two drives it will rebuild both parity sets at once.

Thanks,

Reddy......

0 Kudos
YCAH
1 Nickel

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Steve, I see it different on NS960. We had a drive failure on 3 0 12 and its replaced and the drive status is enabled but i dont see hot spare going back to inactive state. Its been few hours in that state

Bus 3 Enclosure 0  Disk 12

Hot Spare:               47: NO

Bus 3 Enclosure 1  Disk 13

Hot Spare:               24571: YES

Hot Spare Replacing:     3_0_12

/nas/sbin/navicli -h spa getdisk 3_0_12 -state -rb

Bus 3 Enclosure 0  Disk 12

State:                   Enabled

Prct Rebuilt:            47: 100

0 Kudos
kelleg
4 Ruthenium

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

First - the last message from Steve was in 2013, not sure he would still be watching this thread.

Second, the rebuild for the replacement disk is finished. When a disk fails and is replaced by the hot spare, the data that was on the failed disk will be rebuilt from the remaining disks in the raid group that owned the failed disk (rebuilt from the raid parity). Once that's complete, if you remove and insert the replacement disk, the process to move the data from the hot spare to the new disk is called "equalize" - this is basically a copy of the data from the hot spare to the new disk. Both of these processes take time. Depending on the type of disk that failed it could take hours to days. The slower the disk being replaced, the longer the rebuild/equalize will take. The slowest disks on the high capacity ATA disk, the fastest are the SSDs.

/nas/sbin/navicli -h spa getdisk 3_0_12 -state -rb

Bus 3 Enclosure 0  Disk 12

State:                   Enabled

Prct Rebuilt:            47: 100 <----------

I'd checked it again in a couple of hours to see if the equalize has started/finished.

glen

0 Kudos
YCAH
1 Nickel

Re: Clariion CX4 - Failed Drive, how can i tell if Hotspare has kicked in

Thanks Glen. Agree that i should have created a new discussion.

The failed disk is a 600GB FC drive and is replaced with another disk of same specifications. I just checked its status and there are no changes. Is there a way to kick back hot spare 3 1 13 to inactive.

/nas/sbin/navicli -h spa getdisk 3_0_12 -state -rb -rds -wrts -write

Bus 3 Enclosure 0  Disk 12

State:                   Enabled

Prct Rebuilt:            47: 100

Read Requests:           326972

Write Requests:          1134749

Number of Writes:        1134749

/nas/sbin/navicli -h spa getdisk 3_1_13 -type -hr -hs -state -rb

Bus 3 Enclosure 1  Disk 13

Type:                    24571: Hot Spare

Hard Read Errors:        0

Hot Spare:               24571: YES

Hot Spare Replacing:     3_0_12

State:                   Enabled

Prct Rebuilt:            24571: 100

0 Kudos