Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

35335

April 11th, 2012 10:00

Uncertified SSDs producing a Predictive Failure Count

Hello all,

I have 2 servers here, each with 6 Kingston SH100S3 240G drives hooked up to a Dell PERC H700. They both run CentOS 6.2. In each server the controller has marked one drive with a single 'Predictive Failure Count'. The drives in question appear to be operating normally, which I'd imagine is often the case with a predictive failure, but I can't find the source of it's complaint to determine if there's a genuine issue. I've checked these:

PERC event log: https://gist.github.com/6303dc7edbfc402f96a6

SMART data: https://gist.github.com/d9920e1aad6751ef7d7c

Has anyone seen anything like this before?

Any suggestions for further research?

41 Posts

April 11th, 2012 11:00

i have seen this on a customer with HP servers and OCZ Vertex3 SSDs (sandforce) which iirc is the same controller as those kingston ones.

they where testing performance when it started marking 2 our of 4 drives as predicted failure, of course the controller log and smart counters don't show absolutely NOTHING wrong with them other than the "predicted failure" flag on

4 Operator

 • 

1.8K Posts

April 11th, 2012 18:00

Call Kingston, they want happy customers. They replaced one of my SSDs which ocassionaly blue screened. They may replace all of your "predictives",  Also are you up to the new firmware level?

8 Posts

April 12th, 2012 01:00

Firmware is definitely behind, 332 versus the 501 latest, but I don't see anything related in the release notes. Either way this would be tough to do as their in a production raid.

I'll have to wait for a more concrete failure or way to reproduce the issue. If any actually fail I'll update this thread.

Thanks guys!

41 Posts

April 12th, 2012 13:00

the H700 uses the LSISAS 2108 ROC

that's the same as the one's used in the LSI megaraid 9250/60/80

enterprise firmware updates for branded HDD is pretty safe(i.e.: dell drives, HP drives) as they're validated internally and have some pretty robust update processes

4 Operator

 • 

1.8K Posts

April 12th, 2012 13:00

"Either way this would be tough to do as their in a production raid." agree, The only raid 5 I ever lost was due to a drive firmware update provided by Seagate back in 2003, so I am leary of any drive firmware update.

Are you up to date on the h700 firmware? Updating the array controller's firmware is fairly safe.

Perhaps you might get more info with the LSI Logic array software, I use both Dell's

Array Manager and the LSI management software on most of my servers, sorry I have not used the H700 yet, you would need to figure out the H700 equal Lsi model, perhaps someone else will chime in. I was surprised how easy Kingston replaced my drive, great customer service, great company.

4 Operator

 • 

1.8K Posts

April 12th, 2012 16:00

Glovato...

Agree, updates are "pretty safe", but so was the update released by Seagate in my incident; the updated  firmware was rated critical as to the specific drive model I had. I check with Seagate by phone to their top tier support as to anyone having  issues with the firmware flash. I even started (and ended) with the firmware update to the hotspare drive. The crash resulted with me working 36 hours straight, 4 hours of sleep, then another straight 36 hour day (on a Thankgiving weekend) to get the database/file server back up and running ( that was with a backup).....so anyone who does a drive firmware update has the right to be leary.

Now, unless the firmware update is rated critical, and I have a newly created clone, I will not update.

4 Operator

 • 

1.8K Posts

April 13th, 2012 07:00

"With Dell servers i still haven't managed to do such big updates and the tools for updating are IMHO a whole 4 generations behind what HP has"

Sure beats having to put them individually on a SCSI hba

In the last few years firmware updates are taken more seriously by the drive manufacturers, I have not had any issues in years. To date myself in the 80s and 90s, it was all to common  to get serious bugs in firmware updates, or tech support to point you to the incorrect bios to flash your mobo..liability has put an end to that.

No Events found!

Top