perccli commands to replace a failing disk

Question

I have a disk in a raid 10 virtual disk that is failing. The physical disk caddy has an amber light on it and in the perc display it shows with an orange yield sign with an exclamation mark in it. I assume that means it is failing even though it is still running and not yet degrading the virtual disk. I have a new spare now to replace it. I don't have a spare slot in the chassis. If I did I would insert the new disk, make it a hot spare for the virtual disk, then pull the failing disk. Since I don't have a spare slot. what is the best way to approach this. In other words, do which series of commands in which order? Do I just take the failing disk off line, pull it, replace with new, mark it as hot spare? I'd kind of like assurance that I'm doing it in the right order before going ahead The disk is a member of Controller0, Virtual disk1. It is disk#6 on enclosure#32 And indeed queerying the disk gives me the same serial number as the one that shows in idrac with an amber exclamation mark [root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/v1 show allCLI Version = 007.0529.0000.0000 Sep 18, 2018Operating system = VMkernel 6.7.0Controller = 0Status = SuccessDescription = None /c0/v1 :====== ----------------------------------------------------------------DG/VD TYPE State Access Consist Cache Cac sCC Size Name----------------------------------------------------------------1/1 RAID10 Optl RW Yes RWBD - OFF 1.635 TB SAS15K---------------------------------------------------------------- Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=DegradedOptl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=ScheduledCheck Consistency PDs for VD 1 :============ ------------------------------------------------------------------------------EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type------------------------------------------------------------------------------32:10 10 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -32:11 11 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -32:7 7 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -32:8 8 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -32:9 9 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U ------------------------------------------------------------------------------- EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroupDHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global HotspareUBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-InterfaceMed-Media Type|SED-Self Encrypting Drive|PI-Protection InfoSeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-ForeignUGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shieldedCFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded VD1 Properties :==============Strip Size = 64 KBNumber of Blocks = 3512991744VD has Emulated PD = NoSpan Depth = N/ANumber of Drives Per Span = N/AWrite Cache(initial setting) = WriteBackDisk Cache Policy = Disk's DefaultEncryption = NoneData Protection = NoneActive Operations = NoneExposed to OS = YesCreation Date = 07-03-2016Creation Time = 01:18:54 PMEmulation type = defaultCachebypass Mode = Cachebypass DisableIs LD Ready for OS Requests = YesSCSI NAA Id = 61418770601b16001e703c3e0950c5f9SCSI Unmap = No   [root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/e32/s6 show allCLI Version = 007.0529.0000.0000 Sep 18, 2018Operating system = VMkernel 6.7.0Controller = 0Status = SuccessDescription = Show Drive Information Succeeded. Drive /c0/e32/s6 :================ ------------------------------------------------------------------------------EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type------------------------------------------------------------------------------32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U ------------------------------------------------------------------------------- EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroupDHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global HotspareUBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-InterfaceMed-Media Type|SED-Self Encrypting Drive|PI-Protection InfoSeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-ForeignUGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shieldedCFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded Drive /c0/e32/s6 - Detailed Information :======================================= Drive /c0/e32/s6 State :======================Shield Counter = 0Media Error Count = 0Other Error Count = 0Drive Temperature = 37C (98.60 F)Predictive Failure Count = 1S.M.A.R.T alert flagged by drive = Yes Drive /c0/e32/s6 Device attributes :==================================SN = S7M0QBXXManufacturer Id = SEAGATEModel Number = ST600MP0005NAND Vendor = NAWWN = 5000C5008F5CE770Firmware Revision = VT31Firmware Release Number = N/ARaw size = 558.911 GB [0x45dd2fb0 Sectors]Coerced size = 558.375 GB [0x45cc0000 Sectors]Non Coerced size = 558.411 GB [0x45cd2fb0 Sectors]Device Speed = 12.0Gb/sLink Speed = 12.0Gb/sWrite Cache = DisabledLogical Sector Size = 512BPhysical Sector Size = 512BConnector Name = 00 Drive /c0/e32/s6 Policies/Settings :==================================Drive position = DriveGroup:1Enclosure position = 1Connected Port Number = 0(path0)Sequence Number = 2Commissioned Spare = NoEmergency Spare = NoLast Predictive Failure Event Sequence Number = 6513Successful diagnostics completion on = N/ASED Capable = NoSED Enabled = NoSecured = NoCryptographic Erase Capable = NoLocked = NoNeeds EKM Attention = NoPI Eligible = NoCertified = YesWide Port Capable = No Port Information :================ -----------------------------------------Port Status Linkspeed SAS address-----------------------------------------0 Active 12.0Gb/s 0x5000c5008f5ce7711 Active 12.0Gb/s 0x0----------------------------------------- Inquiry Data =00 00 06 12 8b 01 10 02 53 45 41 47 41 54 45 2053 54 36 30 30 4d 50 30 30 30 35 20 20 20 20 2056 54 33 31 53 37 4d 30 51 42 58 58 00 00 00 0000 00 00 00 00 00 00 00 00 00 00 a2 0c 60 20 e004 60 04 c0 00 00 00 00 00 00 00 00 00 00 00 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000 43 6f 70 79 72 69 67 68 74 20 28 63 29 20 3230 31 35 20 53 65 61 67 61 74 65 20 41 6c 6c 20

DELL-Chris H · Accepted Answer

Billeuze,

With the drive showing as an online Predicted Failure, you will need to force that drive offline (page 25 here). prior to removal. After that you should be able to remove the problem drive, wait about 20 seconds and insert the replacement. It should start the rebuild automatically, if not then you can assign it as a hotspare with commands found on page 34 here.

Let me know how it goes.

billeuze · Answer

yup, it started rebuilding immediately, thanks.

brosysadm · Answer

Hello there,

I have very similar questoion.

If failed drive was taken out on hot, and reported as missing, new drive was inserted.

How to force rebuild the new drive?

Add it manually to RAID where drive was fault and rebuild it?

Assign it as hotspare and it will take care of missing drive automatically?

Insert back to server in the same slot and do it correct thought the perccli?

Thank you

Dell-DylanJ · Answer

Hello,   I would recommend marking the replacement drive as a hotspare and allowing the controller to manage the rebuild, as you had mentioned in your post.

Dell-Martin S · Answer

Hi, please check https://dell.to/3S94xPK this should answer your questions.     Did you have any other questions?   Regards Martin

plegrand1 · Answer

Hello,I am aware that this post is particularly old but I have exactly the same problem on an H840 card and 15 disks in RAID 5One disk has problems :Drive has flagged a S.M.A.R.T alert : yesAdapter 1Enclosure Device ID: 251Slot Number: 5 Is it possible to do the same thing with 'megacli' that I usually use ?megacli -PDOffline -PhysDrv[251:5] -a1Then put the disk OffLine, then remove it mechanically, then put the new one in its place.The rebuild starts by itself?I see on the net that you have to put the disk in 'missing state' then in 'removable state' before removing it, is it really useful in the case of a replacement?Thanks for your help

DELL-Chris H · Answer

Plegrand1,

In regards to a Predicted Failure drive, which is in an ONLINE state, you indeed need to OFFLINE that drive prior to replacing it. This is to ensure that the bad blocks on the Predicted Failure drive don't get moved to the replacement drive. So you would offline it, replace it, then if the rebuild doesn't automatically start you can set it as a Hotspare to start it.

Let me know if this answers your question.

plegrand1 · Answer

Hello and thank you for your answer,

My question was more to know if in this case (the same as the one mentioned at the beginning of this post : replacement of a defective disk), it was enough to only pass it in Offline* before removing it and before putting the new one.

* With this megacli command : megacli -PDOffline -PhysDrv[251:5] -a1

Thanks again for your help.

DELL-Chris H · Answer

Pascal,

That looks correct to me. Also, just on a side note, if you ever need to offline a drive, when you don't have access to it via cli or gui, you can power off the server, then remove the drive, then power up the server, then add a replacement drive when back in the OS.

plegrand1 · Answer

Thanks again for your answer.Just to be sure, you confirm to me that megacli do the same job as perccli to pass offline a disk on a PERC H840? megacli -PDOffline -PhysDrv[251:5] -a1perccli /c1/e251/s5 set offline   Thanks again Pascal

plegrand1 · Answer

I have to replace a disk tomorrow then i will tell you if all is ok after that

plegrand1 · Answer

I take advantage of this thread to ask you 2 smalls questions:

To find information concerning the replacement of disks on a raid 5 (DELL PERC), i did a lot of research and on almost all the sites it was question in addition to putting the disk "Offline" to put it in "Missing" then in "Removable".
Can you explain to me what this means and in which case should we proceed this way?

Since I cannot install Omsa on my Debian server, I use Megacli.
Can you tell me which tool to use with recent cards (H840 H730...)
Megacli, Perccli or Storcli?

In any case thank you for your help.

Pascal

plegrand1 · Answer

Here is the procedure I will follow for the replacement of my disk.
Does it look good to you?

Should I choose megacli or perccli or even storcli which I don't know?

----------------------

megacli -PDInfo -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 show all

megacli -PDLocate -start -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 start locate

megacli -PDLocate -stop -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 stop locate

megacli -PDOffline -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 set offline

Replacement of the disk

megacli -PDRbld -ShowProg -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 show rebuild

If rebuild doesnt start
megacli -PDHSP -Set -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 add hotsparedrive

plegrand1 · Answer

My original configuration was as follows
Raid 5 on a PERC H840 card
15 disks
One spare disk: Disk [251:0]
A bad disk : Disk [251:5]

So I did the following to replace the bad disk
megacli -PDOffline -PhysDrv[251:5] -a1
Once this disk is Offline, the disk [251:0] (spare) is automatically switched to "Rebuild".

Replacement of the defective disk with a new identical disk
This disk has been switched to "Hotspare, Spun Up" mode
The rebuild on this disk did not start

Now the disks are in the following state:

Old defective disk replaced by a new one
megacli -PDHSP -Set -PhysDrv[251:5] -a1
Firmware state: Hotspare, Spun Up

Old spare disk
megacli -PDInfo -PhysDrv[251:0] -a1
Firmware state: Rebuild

Is this normal? The new spare disk is now disk [251:5], or should I do something?

plegrand1 · Answer

According to you, it's not a problem that the 'old hot spare' became 'online' and the new disk became 'Hot spare' ?

Systems Management General

Was this post helpful?