Systems Management General

Last reply by 09-23-2022 Solved
Start a Discussion
2 Bronze
2 Bronze
9894

perccli commands to replace a failing disk

I have a disk in a raid 10 virtual disk that is failing. The physical disk caddy has an amber light on it and in the perc display it shows with an orange yield sign with an exclamation mark in it. I assume that means it is failing even though it is still running and not yet degrading the virtual disk. I have a new spare now to replace it. I don't have a spare slot in the chassis. If I did I would insert the new disk, make it a hot spare for the virtual disk, then pull the failing disk. Since I don't have a spare slot. what is the best way to approach this. In other words, do which series of commands in which order? Do I just take the failing disk off line, pull it, replace with new, mark it as hot spare? I'd kind of like assurance that I'm doing it in the right order before going ahead

The disk is a member of Controller0, Virtual disk1. It is disk#6 on enclosure#32

And indeed queerying the disk gives me the same serial number as the one that shows in idrac with an amber exclamation mark

[root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/v1 show all
CLI Version = 007.0529.0000.0000 Sep 18, 2018
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = None


/c0/v1 :
======

----------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
----------------------------------------------------------------
1/1 RAID10 Optl RW Yes RWBD - OFF 1.635 TB SAS15K
----------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency


PDs for VD 1 :
============

------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
------------------------------------------------------------------------------
32:10 10 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:11 11 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:7 7 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:8 8 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
32:9 9 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encrypting Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


VD1 Properties :
==============
Strip Size = 64 KB
Number of Blocks = 3512991744
VD has Emulated PD = No
Span Depth = N/A
Number of Drives Per Span = N/A
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disk's Default
Encryption = None
Data Protection = None
Active Operations = None
Exposed to OS = Yes
Creation Date = 07-03-2016
Creation Time = 01:18:54 PM
Emulation type = default
Cachebypass Mode = Cachebypass Disable
Is LD Ready for OS Requests = Yes
SCSI NAA Id = 61418770601b16001e703c3e0950c5f9
SCSI Unmap = No

 

[root@CCV-VMware-Lam:/opt/lsi/perccli] ./perccli /c0/e32/s6 show all
CLI Version = 007.0529.0000.0000 Sep 18, 2018
Operating system = VMkernel 6.7.0
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive /c0/e32/s6 :
================

------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
------------------------------------------------------------------------------
32:6 6 Onln 1 558.375 GB SAS HDD N N 512B ST600MP0005 U -
------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encrypting Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Drive /c0/e32/s6 - Detailed Information :
=======================================

Drive /c0/e32/s6 State :
======================
Shield Counter = 0
Media Error Count = 0
Other Error Count = 0
Drive Temperature = 37C (98.60 F)
Predictive Failure Count = 1
S.M.A.R.T alert flagged by drive = Yes


Drive /c0/e32/s6 Device attributes :
==================================
SN = S7M0QBXX
Manufacturer Id = SEAGATE
Model Number = ST600MP0005
NAND Vendor = NA
WWN = 5000C5008F5CE770
Firmware Revision = VT31
Firmware Release Number = N/A
Raw size = 558.911 GB [0x45dd2fb0 Sectors]
Coerced size = 558.375 GB [0x45cc0000 Sectors]
Non Coerced size = 558.411 GB [0x45cd2fb0 Sectors]
Device Speed = 12.0Gb/s
Link Speed = 12.0Gb/s
Write Cache = Disabled
Logical Sector Size = 512B
Physical Sector Size = 512B
Connector Name = 00


Drive /c0/e32/s6 Policies/Settings :
==================================
Drive position = DriveGroup:1
Enclosure position = 1
Connected Port Number = 0(path0)
Sequence Number = 2
Commissioned Spare = No
Emergency Spare = No
Last Predictive Failure Event Sequence Number = 6513
Successful diagnostics completion on = N/A
SED Capable = No
SED Enabled = No
Secured = No
Cryptographic Erase Capable = No
Locked = No
Needs EKM Attention = No
PI Eligible = No
Certified = Yes
Wide Port Capable = No

Port Information :
================

-----------------------------------------
Port Status Linkspeed SAS address
-----------------------------------------
0 Active 12.0Gb/s 0x5000c5008f5ce771
1 Active 12.0Gb/s 0x0
-----------------------------------------


Inquiry Data =
00 00 06 12 8b 01 10 02 53 45 41 47 41 54 45 20
53 54 36 30 30 4d 50 30 30 30 35 20 20 20 20 20
56 54 33 31 53 37 4d 30 51 42 58 58 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 a2 0c 60 20 e0
04 60 04 c0 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 43 6f 70 79 72 69 67 68 74 20 28 63 29 20 32
30 31 35 20 53 65 61 67 61 74 65 20 41 6c 6c 20

Solution (1)

Accepted Solutions
9345

Billeuze,

 

With the drive showing as an online Predicted Failure, you will need to force that drive offline (page 25 here). prior to removal. After that you should be able to remove the problem drive, wait about 20 seconds and insert the replacement. It should start the rebuild automatically, if not then you can assign it as a hotspare with commands found on page 34 here.

 

Let me know how it goes.

 


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!

View solution in original post

Replies (19)
9346

Billeuze,

 

With the drive showing as an online Predicted Failure, you will need to force that drive offline (page 25 here). prior to removal. After that you should be able to remove the problem drive, wait about 20 seconds and insert the replacement. It should start the rebuild automatically, if not then you can assign it as a hotspare with commands found on page 34 here.

 

Let me know how it goes.

 


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
9331

yup, it started rebuilding immediately, thanks.

4821

Hello there,

I have very similar questoion.

If failed drive was taken out on hot, and reported as missing, new drive was inserted.

How to force rebuild the new drive?

Add it manually to RAID where drive was fault and rebuild it?

Assign it as hotspare and it will take care of missing drive automatically?

Insert back to server in the same slot and do it correct thought the perccli?

 

Thank you

4813

Hello,

 

I would recommend marking the replacement drive as a hotspare and allowing the controller to manage the rebuild, as you had mentioned in your post.

#Iwork4Dell
1643

Hello,
I am aware that this post is particularly old but I have exactly the same problem on an H840 card and 15 disks in RAID 5
One disk has problems :
Drive has flagged a S.M.A.R.T alert : yes
Adapter 1
Enclosure Device ID: 251
Slot Number: 5

Is it possible to do the same thing with "megacli" that I usually use ?
megacli -PDOffline -PhysDrv[251:5] -a1
Then put the disk OffLine, then remove it mechanically, then put the new one in its place.
The rebuild starts by itself?
I see on the net that you have to put the disk in "missing state" then in "removable state" before removing it, is it really useful in the case of a replacement?
Thanks for your help

1631

Hi,

please check https://dell.to/3S94xPK this should answer your questions.

 

 

Did you have any other questions?

 

Regards Martin


Martin Schiemenz
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
1625

Hello and thank you for your answer,

My question was more to know if in this case (the same as the one mentioned at the beginning of this post : replacement of a defective disk), it was enough to only pass it in Offline* before removing it and before putting the new one.

* With this megacli command : megacli -PDOffline -PhysDrv[251:5] -a1


Thanks again for your help.

1623

Plegrand1,

 

In regards to a Predicted Failure drive, which is in an ONLINE state, you indeed need to OFFLINE that drive prior to replacing it. This is to ensure that the bad blocks on the Predicted Failure drive don't get moved to the replacement drive. So you would offline it, replace it, then if the rebuild doesn't automatically start you can set it as a Hotspare to start it. 

 

Let me know if this answers your question.

 

 


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
2 Bronze
2 Bronze
1617

Thanks again for your answer.
Just to be sure, you confirm to me that megacli do the same job as perccli to pass offline a disk on a PERC H840?

megacli -PDOffline -PhysDrv[251:5] -a1
perccli /c1/e251/s5 set offline

 

Thanks again

Pascal

Latest Solutions
Top Contributor