Start a Conversation

Unsolved

This post is more than 5 years old

S

6482

November 6th, 2012 10:00

Questions about proactive hot spare copy and swapping drives in CX320

We have a locally owned spare SATA drive for a CX320 that is to be used for a replacement drive when there is time gap between a drive going bad and the maintenance vendor providing a replacement.  We recently used this spare drive.  Then when we got the replacement, we did a proactive hot spare copy of this locally owned spare in preparation of swapping in the replacement.  We got these messages from the locally owned spare and the hot spare - we don't know if these messages were when the proactive hot spare copy ended:

     Enclosure 1 Disk 13 Description Unit Shutdown

     Array Enclosure (Bus 0 Enclosure) is faulted. Servers may have lost access to disk drives in this enclosure.

     Device Enclosure 6 Disk 0 Description Disk (Bus 0 Enclosure 6 Disk 0) is faulted.

     Storage Array Faulted Bus 0 Enclosure 6 : Faulted Bus 0 Enclosure 6 Disk 0 : Removed.

     Storage Array Faulted Bus 0 Enclosure 6 : Faulted Bus 0 Enclosure 6 Disk 0 : Removed.

     Enclosure 6 Disk 0 Description CRU Powered Down

     Description Disk (Bus 0 Enclosure 6 Disk 0) failed or was physically removed.

     SPA Device Enclosure 1 Disk 13 Description Unit Shutdown.

      Enclosure 1 Disk 13 was where the proactive hot spare copy was being written.

      Enclosure 6 Disk 0 was where the locally owned spare SATA drive was.

      Are these the messages I would get when the proactive hot spare copy completed?

      Now the locally owned spare has been replaced with the vendor provided disk.

      My other questions are:  anyway to know that the locally owned spare is still good?  anyway to initialize this disk so while its sitting waiting for its next use we know there is no data on it? 

      Appreciating any experience, advice, etc.

1.4K Posts

November 6th, 2012 13:00

Are these the messages I would get when the proactive hot spare copy completed?

Yes those are some of the messages you will get while copying the hotspare.

      Now the locally owned spare has been replaced with the vendor provided disk.

      My other questions are:  anyway to know that the locally owned spare is still good?  anyway to initialize this disk so while its sitting waiting for its next use we know there is no data on it? 

Once the original failed disk is replaced with a new disk, it copies the data from the hotspare to the new disk. This operation is Copy - Paste - Delete (So, now that locally replaced hotspare would be blank and as good as new drive which would show unbound (if hotspare is not configured for that location) else H/S Ready. If you keep that drive in that slot CLARiiON will automatically perform Copy to Hotspare operation for failing drive. (Release 24 & above)

November 7th, 2012 05:00

Here's clarification to my question:

There was not a failed disk in the Enclosure 6 Disk 0 slot.  It was a good one.   We did a proactive hot spare copy of that one.   We then took that disk out and replaced it with another good one.    Hence,  The disk I'm asking about is the first one in Enclosure 6 Disk 0 slot.  It was good when it was removed.  So the data it had on it from the original Raid Set and LUNs is still on it -- obviously only bits and pieces.  Anyway to initialize this drive so its ready to be used again if needed?

Does this make my question more understandable?

Thanks!

247 Posts

November 7th, 2012 07:00

Watch out with this approach. On your second step when proactively copying the drive, once it completed the array will have dialed out again to EMC to signal a drive failure. So there's a big chance there's another case open right now to ship another drive

To answer some of your questions:

- Yes that are normal proactive copy messages. For EMC this is a "scheduled" drive failure. So check support to see if there is an additional SR open and close it if you don't want another drive.

- A drive is zeroed once you insert it in a DAE. The drive is NOT zeroed if you remove it. So theoretically there is data on that drive; whether or not you can read it depends on how much money you want to invest in reading it and for example on your RAID protection (RAID10 has a mirror of all data on that drive, parity raid only parts of the data so the chance of recovery is slim).

- To zero, you could insert the drive in the CX, build a single-disk RAID group on it. Fill the RG with a large LUN. Wait for background zeroing/binding to complete, then eject the drive again. (But if you have this empty drive slot available.. why not leave the drive in and configure it as a hotspare so that everything happens automatically?)

- If the local spare drive was good when you proactively copied it, it's still good now

On a completely different matter: why would you do this kind of disk swapping? From your situation i can conclude you have hotspares configured. If you adhere to the hotspare ratio of 1:30 drives, chances of running out of hotspares are very, very, very slim. And even if you run out of hotspares, you still have RAID group protection. Assuming RAID5 you could lose a drive, rebuild to a hotspare for as long as you have hotspares (which can be 1, 2, x drives) and then still lose one drive on that group before you're unprotected.

Are your vendor lead times that long that you need to do this magic with spare drives? Of your drive failure rates that high?

448 Posts

November 7th, 2012 08:00

The drive should be able to be used as it is.  The way flare works is if a drive drops ready it automatically rebuilt when it brough tready or when a new one is inserted.  there is a command to format a drive back to how it ships orginally but you have to know the engineering password.  If done wrong you will wipe data as it writes zeros over the drive.  The only thing it buys you is a faster lun bind so its not really worth doing.

No Events found!

Top