Start a Conversation

Unsolved

This post is more than 5 years old

41411

August 7th, 2013 14:00

Question about clearing HDD Cache. HDD in predictive failure, replaced and rebuilt array, but predictive failures persisted.

Hello all!

I have a PE29590 III with 6 HDDs. 2 disks are RAID-1, containing the OS. The other four contain another array that is being used basically for just storage of large, static data. 

Recently, I had one disk fall into predictive failure state, which I immediately RMA'ed and then replaced. About 36 hours after the replacement the NEW drive that I swapped in fell into predictive failure as well. At this point I was fairly annoyed, and a second drive, from the same controller, fell into predictive failure as well. I did the same process again by RMAing the second PF drive, it rebuilt, then fell back into PF mode as well.

I know that when a HD in a RAID-5 Array goes into predictive failure, or shows any signs of wonkiness, there is a good possiblity of it causing issues on other disks in the same array. My question is, is there any way to clear the cache on the HDD itself of any flags that had been piling up marking that drive as PF? I've read in a few places (sorry I do not have a citation handy) that some HDs, if not all, have a cache on the disk itself that can 'fill up' with errors or flags, and that can be flushed.

Am I completely off base or is there some truth to this?

Thanks all for any insight! 

Best Regards,

Ian

August 7th, 2013 14:00

EDIT: The 4 HDD's are in a RAID-5 configuration.

7 Technologist

 • 

16.3K Posts

August 7th, 2013 14:00

The controller usually flags the disk by serial number/signature, and the flag does not reside on the disk itself.  The only "cache" I'm aware of on disks is the write cache, which is disabled or not used when connected to a PERC.

Have you tested any of these disks with thorough diagnostics?  Your array could be corrupt, causing the predicted failure to manifest itself on a particular disk where the data was incomplete.  Usually, this will fail the rebuild before it completes, but I've seen it go beyond that.

Your controller log may have some indication of the nature of the failure.  I would also run extensive diagnostics on the disk to confirm or refute that the disk is physically faulty.

7 Technologist

 • 

16.3K Posts

August 7th, 2013 14:00

"They came back and said basically the same thing"

Said ... that the disk was bad?

32-bit Diagnostics will be more thorough than the Online diagnostics, but if your disk failed Online, there isn't much point to testing with offline diags.

What model disks are you using?  Are they from Dell?  Are they Dell-certified for this server/controller?  or are they generic enterprise disks?  or are they consumer-grade/desktop disks?

August 7th, 2013 14:00

Thanks for the quick response flash!

I did run the latest online diags from Dell earlier today. They came back and said basically the same thing. (Although I may have ran the wrong diagnostic suite.) I am pretty sure that as soon as I offline one of the PF drives, the array will go into a degraded state, and when I pop the disk back in it will go through the same thing; throwing Sense and URE errors. At that point its 90% likely that Im going to lose the array no matter what. I've got solid backups of all the data on the array already, so its not an issue I just wanted to know if there was something that could have filled up (cache) that I could clear that might let the new drive operate in a non-PF state.

 

I'll check the controller logs and reboot the server and see what everything looks like inside the raid manager at start. Another interesting note, I did try to reboot the server, and it came up and completed a chkdisk session. It moved /a lot/ of "index" and "orphan" files. Im thinking that the fact it moved/deleted anything on the disks did not get replicated correctly and caused the second disk to fall into the PF state. Just a theory though.

Will check back with logs and results from more diagnostics.

 

Thanks again flash!

Best,

Ian

No Events found!

Top