Start a Conversation

Solved!

Go to Solution

11630

December 3rd, 2019 17:00

perccli commands to replace a failing disk

I have a disk in a raid 10 virtual disk that is failing. The physical disk caddy has an amber light on it and in the perc display it shows with an orange yield sign with an exclamation mark in it. I assume that means it is failing even though it is still running and not yet degrading the virtual disk. I have a new spare now to replace it. I don't have a spare slot in the chassis. If I did I would insert the new disk, make it a hot spare for the virtual disk, then pull the failing disk. Since I don't have a spare slot. what is the best way to approach this. In other words, do which series of commands in which order? Do I just take the failing disk off line, pull it, replace with new, mark it as hot spare? I'd kind of like assurance that I'm doing it in the right order before going ahead

The disk is a member of Controller0, Virtual disk1. It is disk#6 on enclosure#32

And indeed queerying the disk gives me the same serial number as the one that shows in idrac with an amber exclamation mark

[root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/v1 show all
CLI Version = 007.0529.0000.0000 Sep 18, 2018
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = None


/c0/v1 :
======

----------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
----------------------------------------------------------------
1/1 RAID10 Optl RW Yes RWBD - OFF 1.635 TB SAS15K
----------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency


PDs for VD 1 :
============

------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
------------------------------------------------------------------------------
32:10 10 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:11 11 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:7 7 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:8 8 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:9 9 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encrypting Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


VD1 Properties :
==============
Strip Size = 64 KB
Number of Blocks = 3512991744
VD has Emulated PD = No
Span Depth = N/A
Number of Drives Per Span = N/A
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disk's Default
Encryption = None
Data Protection = None
Active Operations = None
Exposed to OS = Yes
Creation Date = 07-03-2016
Creation Time = 01:18:54 PM
Emulation type = default
Cachebypass Mode = Cachebypass Disable
Is LD Ready for OS Requests = Yes
SCSI NAA Id = 61418770601b16001e703c3e0950c5f9
SCSI Unmap = No

 

[root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/e32/s6 show all
CLI Version = 007.0529.0000.0000 Sep 18, 2018
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive /c0/e32/s6 :
================

------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
------------------------------------------------------------------------------
32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encrypting Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Drive /c0/e32/s6 - Detailed Information :
=======================================

Drive /c0/e32/s6 State :
======================
Shield Counter = 0
Media Error Count = 0
Other Error Count = 0
Drive Temperature = 37C (98.60 F)
Predictive Failure Count = 1
S.M.A.R.T alert flagged by drive = Yes


Drive /c0/e32/s6 Device attributes :
==================================
SN = S7M0QBXX
Manufacturer Id = SEAGATE
Model Number = ST600MP0005
NAND Vendor = NA
WWN = 5000C5008F5CE770
Firmware Revision = VT31
Firmware Release Number = N/A
Raw size = 558.911 GB [0x45dd2fb0 Sectors]
Coerced size = 558.375 GB [0x45cc0000 Sectors]
Non Coerced size = 558.411 GB [0x45cd2fb0 Sectors]
Device Speed = 12.0Gb/s
Link Speed = 12.0Gb/s
Write Cache = Disabled
Logical Sector Size = 512B
Physical Sector Size = 512B
Connector Name = 00


Drive /c0/e32/s6 Policies/Settings :
==================================
Drive position = DriveGroup:1
Enclosure position = 1
Connected Port Number = 0(path0)
Sequence Number = 2
Commissioned Spare = No
Emergency Spare = No
Last Predictive Failure Event Sequence Number = 6513
Successful diagnostics completion on = N/A
SED Capable = No
SED Enabled = No
Secured = No
Cryptographic Erase Capable = No
Locked = No
Needs EKM Attention = No
PI Eligible = No
Certified = Yes
Wide Port Capable = No

Port Information :
================

-----------------------------------------
Port Status Linkspeed SAS address
-----------------------------------------
0 Active 12.0Gb/s 0x5000c5008f5ce771
1 Active 12.0Gb/s 0x0
-----------------------------------------


Inquiry Data =
00 00 06 12 8b 01 10 02 53 45 41 47 41 54 45 20
53 54 36 30 30 4d 50 30 30 30 35 20 20 20 20 20
56 54 33 31 53 37 4d 30 51 42 58 58 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 a2 0c 60 20 e0
04 60 04 c0 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 43 6f 70 79 72 69 67 68 74 20 28 63 29 20 32
30 31 35 20 53 65 61 67 61 74 65 20 41 6c 6c 20

Moderator

 • 

8.4K Posts

December 4th, 2019 06:00

Billeuze,

 

With the drive showing as an online Predicted Failure, you will need to force that drive offline (page 25 here). prior to removal. After that you should be able to remove the problem drive, wait about 20 seconds and insert the replacement. It should start the rebuild automatically, if not then you can assign it as a hotspare with commands found on page 34 here.

 

Let me know how it goes.

 

14 Posts

December 4th, 2019 17:00

yup, it started rebuilding immediately, thanks.

36 Posts

June 24th, 2021 10:00

Hello there,

I have very similar questoion.

If failed drive was taken out on hot, and reported as missing, new drive was inserted.

How to force rebuild the new drive?

Add it manually to RAID where drive was fault and rebuild it?

Assign it as hotspare and it will take care of missing drive automatically?

Insert back to server in the same slot and do it correct thought the perccli?

 

Thank you

2.9K Posts

June 24th, 2021 13:00

Hello,

 

I would recommend marking the replacement drive as a hotspare and allowing the controller to manage the rebuild, as you had mentioned in your post.

Moderator

 • 

3.2K Posts

September 21st, 2022 06:00

Hi,

please check https://dell.to/3S94xPK this should answer your questions.

 

 

Did you have any other questions?

 

Regards Martin

19 Posts

September 21st, 2022 06:00

Hello,
I am aware that this post is particularly old but I have exactly the same problem on an H840 card and 15 disks in RAID 5
One disk has problems :
Drive has flagged a S.M.A.R.T alert : yes
Adapter 1
Enclosure Device ID: 251
Slot Number: 5

Is it possible to do the same thing with "megacli" that I usually use ?
megacli -PDOffline -PhysDrv[251:5] -a1
Then put the disk OffLine, then remove it mechanically, then put the new one in its place.
The rebuild starts by itself?
I see on the net that you have to put the disk in "missing state" then in "removable state" before removing it, is it really useful in the case of a replacement?
Thanks for your help

Moderator

 • 

8.4K Posts

September 21st, 2022 07:00

Plegrand1,

 

In regards to a Predicted Failure drive, which is in an ONLINE state, you indeed need to OFFLINE that drive prior to replacing it. This is to ensure that the bad blocks on the Predicted Failure drive don't get moved to the replacement drive. So you would offline it, replace it, then if the rebuild doesn't automatically start you can set it as a Hotspare to start it. 

 

Let me know if this answers your question.

 

 

19 Posts

September 21st, 2022 07:00

Hello and thank you for your answer,

My question was more to know if in this case (the same as the one mentioned at the beginning of this post : replacement of a defective disk), it was enough to only pass it in Offline* before removing it and before putting the new one.

* With this megacli command : megacli -PDOffline -PhysDrv[251:5] -a1


Thanks again for your help.

Moderator

 • 

8.4K Posts

September 21st, 2022 08:00

Pascal,

 

That looks correct to me. Also, just on a side note, if you ever need to offline a drive, when you don't have access to it via cli or gui, you can power off the server, then remove the drive, then power up the server, then add a replacement drive when back in the OS.

 

 

19 Posts

September 21st, 2022 08:00

Thanks again for your answer.
Just to be sure, you confirm to me that megacli do the same job as perccli to pass offline a disk on a PERC H840?

megacli -PDOffline -PhysDrv[251:5] -a1
perccli /c1/e251/s5 set offline

 

Thanks again

Pascal

19 Posts

September 21st, 2022 09:00

I have to replace a disk tomorrow then i will tell you if all is ok after that

19 Posts

September 21st, 2022 23:00

I take advantage of this thread to ask you 2 smalls questions:

To find information concerning the replacement of disks on a raid 5 (DELL PERC), i did a lot of research and on almost all the sites it was question in addition to putting the disk "Offline" to put it in "Missing" then in "Removable".
Can you explain to me what this means and in which case should we proceed this way?

Since I cannot install Omsa on my Debian server, I use Megacli.
Can you tell me which tool to use with recent cards (H840 H730...)
Megacli, Perccli or Storcli?

In any case thank you for your help.

Pascal

19 Posts

September 21st, 2022 23:00

Here is the procedure I will follow for the replacement of my disk.
Does it look good to you?

Should I choose megacli or perccli or even storcli which I don't know?

----------------------

megacli -PDInfo -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 show all

megacli -PDLocate -start -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 start locate

megacli -PDLocate -stop -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 stop locate

megacli -PDOffline -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 set offline

Replacement of the disk

megacli -PDRbld -ShowProg -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 show rebuild

If rebuild doesnt start
megacli -PDHSP -Set -PhysDrv[251:5] -a1
# perccli64 /c1/e251/s5 add hotsparedrive

19 Posts

September 22nd, 2022 05:00

My original configuration was as follows
Raid 5 on a PERC H840 card
15 disks
One spare disk: Disk [251:0]
A bad disk : Disk [251:5]

So I did the following to replace the bad disk
megacli -PDOffline -PhysDrv[251:5] -a1
Once this disk is Offline, the disk [251:0] (spare) is automatically switched to "Rebuild".

Replacement of the defective disk with a new identical disk
This disk has been switched to "Hotspare, Spun Up" mode
The rebuild on this disk did not start

Now the disks are in the following state:

Old defective disk replaced by a new one
megacli -PDHSP -Set -PhysDrv[251:5] -a1
Firmware state: Hotspare, Spun Up

Old spare disk
megacli -PDInfo -PhysDrv[251:0] -a1
Firmware state: Rebuild

Is this normal? The new spare disk is now disk [251:5], or should I do something?

19 Posts

September 22nd, 2022 06:00

According to you, it's not a problem that the "old hot spare" became "online" and the new disk became "Hot spare" ?

No Events found!

Top