1 Rookie

 • 

35 Posts

June 25th, 2021 07:00

Hello

I have a predictive disk failure for a few days on a PowerEdge R610. I replaced the disk ( 73GB on a RAID-1 array). I saw some message in /var/log/messages:

Jun 23 11:26:09 daphne-d kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 2 id=0
Jun 23 11:26:09 daphne-d kernel: mptbase: ioc0: PhysDisk is now online, out of sync
Jun 23 11:26:09 daphne-d kernel: mptsas: ioc0: WARNING - firmware bug: unable to add hidden disk - target_id matchs volume_id
Jun 23 11:26:10 daphne-d kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
Jun 23 11:26:10 daphne-d kernel: mptbase: ioc0: volume is now degraded, enabled, resync in progress

and later:

Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 2 id=10
Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: PhysDisk is now online
Jun 23 12:27:11 daphne-d kernel: mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 10, phy 0, sas_addr 0x500000e118ff5022
Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: volume is now optimal, enabled, resync in progress
Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
Jun 23 12:27:11 daphne-d kernel: mptbase: ioc0: volume is now optimal, enabled
Jun 23 12:27:11 daphne-d kernel: scsi 2:0:7:0: Direct-Access FUJITSU MBE2073RC D906 PQ: 0 ANSI: 5
Jun 23 12:27:11 daphne-d kernel: scsi 2:0:7:0: Attached scsi generic sg1 type 0

From that I understand that the resync was done.  However, I never saw the usual state Rebuilding in omreport and still now the disk report Predictive Failure:

Controller SAS 6/iR Integrated (Embedded)
ID : 0:0:0
Status : Non-Critical
Name : Physical Disk 0:0:0
State : Online
Power Status : Not Applicable
Bus Protocol : SAS
Media : HDD
Part of Cache Pool : Not Applicable
Remaining Rated Write Endurance : Not Applicable
Failure Predicted : Yes
Revision : SM04
Driver Version : Not Applicable
Model Number : Not Applicable
T10 PI Capable : No
Certified : Not Applicable
Encryption Capable : No
Encrypted : Not Applicable
Progress : Not Applicable
Mirror Set ID : Not Applicable
Capacity : 67.75 GB (72746008576 bytes)
Used RAID Disk Space : 67.75 GB (72746008576 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : ST973451SS
Serial No. : 3PD22N7N
Part Number : SG0XT7641253189L01ACA00
Negotiated Speed : 3.00 Gbps
Capable Speed : 3.00 Gbps
PCIe Maximum Link Width : Not Applicable
PCIe Negotiated Link Width : Not Applicable
Sector Size : 512B
Device Write Cache : Not Applicable
Manufacture Day : 08
Manufacture Week : 39
Manufacture Year : 2005
SAS Address : 5000C5000EC1251D

Also, every night now I get messages in /var/log/messages:

 

Jun 21 03:23:57 daphne-d Server_Administrator: 3641 2094 - Storage Service Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
Jun 22 03:34:28 daphne-d Server_Administrator: 3641 2094 - Storage Service Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
Jun 23 03:45:02 daphne-d Server_Administrator: 3641 2094 - Storage Service Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
Jun 24 03:55:40 daphne-d Server_Administrator: 3641 2094 - Storage Service Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0
Jun 25 04:06:10 daphne-d Server_Administrator: 3641 2094 - Storage Service Predictive Failure reported: Physical Disk 0:0:0 Controller 0, Connector 0

 

I replaced the disk on June 23rd.

I am not sure that the new disk was really accepted by the system now. Why do I still have a predictive disk failure ?  Would a reboot help ? Any suggestion on how to get rid of the predictive failure ?

Thank you in advance

Gilbert

 

 

4 Operator

 • 

2.9K Posts

June 25th, 2021 09:00

Hello,

 

A reboot may help, I've seen reporting issues like this be solved through a power cycle. This controller doesn't maintain a hardware level RAID log, though. If a reboot doesn't correct the reporting troubleshooting options will be quite limited. 

 

 

Based on the log output you have, it does look like the system thinks the drive is still bad. Restarting would be a good idea. You might also consider reseating storage cabling in that same window.

No Events found!

Top