Start a Conversation

Unsolved

This post is more than 5 years old

6186

January 2nd, 2018 07:00

HDD in RAID-10 array stuck in rebuild status for weeks

Hi all,

I'm reaching out to you all here since I simply cannot figure this one out using the normal web search methods that usually work.  We had several HDD predictive failures and/or failures, and after I cleared up all the others, I just have this one problem left.

There is one HDD in predictive failure in the RAID-10 array, and it's been stuck in "rebuild" state since I discovered the problem weeks ago.  I could really use some advice on how to proceed next, since all OMSA intervention commands via Linux client have resulted in "Operation could not be performed".

I have not yet tried powering off the server and doing a flea power purge, basically since I'll have to engage another group and have to walk them through all of what I need them to do.  I'm not opposed to this, but would rather leave this as the last available option, after exhausting all my own solo efforts.

Here is the likely most pertinent info from OMSA:

Controller
ID                                            : 0
Status                                        : Ok
Name                                          : PERC H710P Mini
Slot ID                                       : Embedded
State                                         : Ready
Firmware Version                              : 21.3.4-0001
Minimum Required Firmware Version             : Not Applicable
Driver Version                                : 07.700.00.00-rc1

ID                                : 1
Status                            : Critical
Name                              : Virtual Disk 1
State                             : Degraded
Hot Spare Policy violated         : Not Assigned
Virtual Disk Bad Blocks           : Yes
Encrypted                         : No
Layout                            : RAID-10
Size                              : 22,353.00 GB (24001350991872 bytes)
T10 Protection Information Status : No
Associated Fluid Cache State      : Not Applicable
Device Name                       : /dev/sdb
Bus Protocol                      : SATA
Media                             : HDD
Read Policy                       : Adaptive Read Ahead
Write Policy                      : Write Back
Cache Policy                      : Not Applicable
Stripe Element Size               : 64 KB
Disk Cache Policy                 : Enabled


ID                              : 0:1:4
Status                          : Non-Critical
Name                            : Physical Disk 0:1:4
State                           : Rebuilding
Power Status                    : Spun Up
Bus Protocol                    : SATA
Media                           : HDD
Part of Cache Pool              : Not Applicable
Remaining Rated Write Endurance : Not Applicable
Failure Predicted               : Yes
Revision                        : 00.0D1K2
Driver Version                  : Not Applicable
Model Number                    : Not Applicable
T10 PI Capable                  : No
Certified                       : Yes
Encryption Capable              : No
Encrypted                       : Not Applicable
Progress                        : 63% complete
Mirror Set ID                   : 0
Capacity                        : 3,725.50 GB (4000225165312 bytes)
Used RAID Disk Space            : 3,725.50 GB (4000225165312 bytes)
Available RAID Disk Space       : 0.00 GB (0 bytes)
Hot Spare                       : No
Vendor ID                       : DELL(tm)
Product ID                      : WD4000FYYX
Serial No.                      : WCC130156170
Part Number                     : TH0N36YX125522CBC0MMA0
Negotiated Speed                : 3.00 Gbps
Capable Speed                   : 3.00 Gbps
PCIe Negotiated Link Width      : Not Applicable
PCIe Maximum Link Width         : Not Applicable
Sector Size                     : 512B
Device Write Cache              : Not Applicable
Manufacture Day                 : Not Available
Manufacture Week                : Not Available
Manufacture Year                : Not Available
SAS Address                     : 500056B36789ABED
Non-RAID HDD Disk Cache Policy  : Not Applicable
Disk Cache Policy               : Not Applicable
Form Factor                     : Not Available
Sub Vendor                      : Not Available
ISE Capable                     : No

Thanks in advance for any potential help that can be given!

Moderator

 • 

8.5K Posts

January 2nd, 2018 09:00

Hi,

Can you run a support assist and see if it will provide more information? http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=YJTKK

7 Posts

January 2nd, 2018 10:00

Unfortunately, it appears that the supportassist package is looking to "phone-home" to Dell, and that's not possible due to segregation from the internet and a pretty strict firewall policy, should I try to proxy it in some way.

Is there any other way to get some insight w/o having to have this utility phone-home via internet?

Thanks again.

Moderator

 • 

8.5K Posts

January 2nd, 2018 10:00

You can try using a proxy. You may also be able to get the data with megacli https://www.broadcom.com/support/download-search?dk=megacli

7 Posts

January 2nd, 2018 11:00

MegaCLI has been installed from the latest bundle provided by Broadcom.

What would you like me to pull via it?

Thanks.

Moderator

 • 

8.5K Posts

January 2nd, 2018 11:00

 ./MegaCli -FwTermLog -Dsply -aALL > /tmp/ttylog.txt

Moderator

 • 

8.5K Posts

January 2nd, 2018 12:00

If you click the options tab on the reply screen you can attach a file. 

7 Posts

January 2nd, 2018 12:00

Great, thanks.  I've got the output now, but the file has 9084 lines in it.  Is there some way to attach a file here?  Else, can I look for something in the output that I can report back on for you here?

Thanks again.

7 Posts

January 2nd, 2018 12:00

Thank you.   File attached.

1 Attachment

Moderator

 • 

8.5K Posts

January 2nd, 2018 13:00

It looks like drive 4 keeps failing rebuilding and failing again. It needs to be replaced.

7 Posts

January 2nd, 2018 13:00

Thanks, I'd figured that would end up being the case.

However, may I ask what would be the correct method for removing the HDD from the system?  Trying to put the HDD into a failed state in order to properly remove w/o the risk of punctures is not possible right now.  OMSA keeps kicking back a message that the "operation is not permitted".

Should the HDD be simply physically removed from the system w/o any other preparation, or should we do something more dramatic, such as powering off, removing the cables, etc., and then replace the HDD?

Thanks again.

Moderator

 • 

8.5K Posts

January 2nd, 2018 14:00

Just remove the drive with the system up and make sure that OMSA shows the drive as removed then put the replacement in. 

7 Posts

January 2nd, 2018 14:00

Thanks.  Will do, and I'll report the result.

8 Posts

March 6th, 2018 11:00

What was the result here?

No Events found!

Top