I've got a Dell T300 domain controller with 4x 146GB SAS drives, 2 in a RAID 1 array, 2 as standard drives. On Monday I came in and the LCD display was saying 'Error 1810 HDD 1 fault' so I did some research and reseating the drive hot cleared the LCD error. I've also done a check using online diagnostics 2.16.0 that says all drives are OK. Despite this, the drive in question's status light is flashing Green - Amber-Off. Reading forums this indicates a predicted failure.
Do I need to change the drive or is it OK to continue using? I've installed Open Manage Server Administrator but it's not picking up the storage controller (SAS 6/ir, storage was selected as an option when installing OMSA, OSMA asking for a reboot but I'm currently in core hours).
Any help gratefully received.
A Predicted Failure is when the bad blocks on a drive exceed a predetermined threshold. With a Predicted Failure you can continue running, but are running the risk that if another drive in that array fails, then you may have an issue with those bad blocks being rebuilt across the array. I would suggest replacing the drive, but when doing so there is a procedure to do so. To replace a Predicted Failure drive you need to go to the controller, or OpenManage, and force the pred fail drive offline. Once it is offline then you can remove and replace the drive. If you were to simply remove the predicted failure drive before rebuilding, then there is a chance of carrying the predicted failure to the new replacement drive.
Let me know if this helps.
Thanks Chris, I'll have a look at OpenManage once I've rebooted the server. Once I've removed the drive can I replace it with one of the two non-RAID drives in the server? Our DC is no longer a file store so drives 2 and 3 are pretty much empty. Is there a special process for doing this?
You can do this, but you would need to access the controller, then hit CTRL-N to access the next page. From there you will highlight the Non-Raid drive and then hit F2, from there select Assign as Global Hotspare. Once completed, that drive should rebuild to the Degraded Virtual Disk that the Predicted Failure was removed from.
Thanks for the info, I did a reboot and I can now see the controller and drives in OMSA. Sure enough Physical Disk 0:0:1 is showing predicted failure 😞
I've read through some of the documentation and AFAIU the SAS 6/ir doesn't support forcing the drive offline through OSMA, the only tasks I have for this drive are 'Blink' and 'Unblink'. The other 2 non-raid disks do give me the option to assign them as Global Hotspares. If I assign one as a global hotspare will it then start doing a rebuild to cover the failing disk? Would I need to move the physical drive to the physical slot in the server where the existing (failing) drive resides? Is there a way of accessing the controller without using the boot-time application (Our second DC is also having issues so uptime is important). Sorry for so many questions but this is the first time I've had a failed HDD in 12 years as a technician.
What you can do is power down the server, and then remove the Predicted Failure drive once the system is off. Once you remove the drive then boot the server. As you had said, at this point you can assign the spare drive as a Hot Spare, then the rebuild should start. You could move the drive over if you prefer, but it isn't a requirement. If you install Open Manage on the server then you can do monitor and manage the server from the OS, including the raid controller.