Start a Conversation

Unsolved

This post is more than 5 years old

M

2 Intern

 • 

207 Posts

3390

January 29th, 2014 08:00

CX-SA07-010 disks failing repeatedly in CX4-240

Hi all,

A recently installed CX4-240 with 9 DAEs full of 1TB CX-SA07-010 disks is having a lot of drive failure issues.  The drives so far have been put into RAID-5 groups of 8 disks each, and we've seen 7 total failures of individual disks so far.  about 30% of the drives have been configured, 5 of the failures have been in those configuraed disks, and 2 others have failed in the yet-unused DAEs.  They appear to be randomly dispersed across different raid groups, DAEs, buses, etc. 

Question is, are there any known systematic causes or issues with certain firmware revs and/or configurations?  Or did we simply get a crappy batch of drives?

2 Intern

 • 

4K Posts

January 29th, 2014 18:00

CX-SA07-010 has many different models, what's the part number (P/N) "00504xxxx" ?

1tb_CX-SA07-010.jpg

2 Intern

 • 

207 Posts

January 30th, 2014 05:00

I've pulled up the SP collect, and it looks like most of them are 005048829, but there are also several

005049058PWR

005049070PWR

005049238PWR,

005049258PWR,

005049412PWR

005049542PWR,

005048797

005048955

005048730 & 005048701 (flare disks)

(any idea what the PWR suffix on those part numbers is?  Power I'm guessing?)

Thanks

January 30th, 2014 06:00

2 Intern

 • 

207 Posts

January 31st, 2014 08:00

I've perused that document, but haven't quite read every word... what exactly am I looking for that pertains to this discussion?

2 Intern

 • 

4K Posts

January 31st, 2014 20:00

All these disks have been faulted? I could not find the related ETAs (EMC Technical Advisory) or KBs with these part numbers on support.emc.com.

Or the "failures" you mentioned are just Errors, such as Soft Media Error or Soft SCSI Bus Error?

You can refer to the KB article emc71072 for the details about these "Soft" errors:

emc71072.jpg

4.5K Posts

February 11th, 2014 13:00

He may have been referring to the raid type that is recommended for SATA II disks - Raid 6 is the recommended raid type (page 57). Also SATA II disks are usually used for low workload applications and not production (page 75). You might want to look at how you're using the disks as they are not as robust as FC disks.

Also, are these all new disks?

glen

2 Intern

 • 

207 Posts

February 11th, 2014 14:00

Glen, Thanks for your input!

All of the disks are refurbished, and we don't seem to have a way to tell how many hours they've run for so far.  We're in the process of gathering more information, will post again if we get any meaningful info to add to the discussion.

4.5K Posts

February 11th, 2014 14:00

An observation. I've seen older arrays, 5+ years, that seem to have a higher incident of disk failures. Same as the light bulbs in your home - they all burn out at the same time as they are all about the same age. You may be experiencing this.

glen

2 Intern

 • 

207 Posts

February 20th, 2014 07:00

Found this article today, found it very interesting and applicable to this situation.  https://emc--c.na5.visual.force.com/apex/KB_ETA?id=kA37000000000NQ

Firmware on the head unit has been updated, drives' firmware I am not certain of but we'll look into that shortly.

4.5K Posts

February 20th, 2014 14:00

That article is:

"ETA emc273801: Celerra, CLARiiON, Data Domain, DLm, Symmetrix DMX-4 and VMAX: Customer Advisory regarding certain 1 TB SATA disk drives"

_https://support.emc.com/kb/3187

The drives that are affected by this ETA are:

005048823

005048829

glen

1 Message

March 27th, 2014 18:00

Glen, I clicked on the article link but I am getting an Access Denied message. I am also having issues with the CX-SA07-010, 005048829 drives. Would you be able to post the contents so I could see if that article applies to my situation?

2 Intern

 • 

207 Posts

May 27th, 2015 09:00

Here's an interesting thing I discovered regarding a specific Seagate model of 1TB drives.  Look out for Barracuda 7200.11 Seagate 1TB drives, apparently a large batch of these were codenamed "MOOSE" or "MOOS" and have an exceedingly high failure rate.  I ran into a big batch of these in a handful of netapp systems a few weeks ago and we saw failure rates on the order of 15-20%. From a sample size of 500+ disks (which included both WD and SGT drives) around 50 had failed, and every single last one that failed was a Seagate Barracuda 7200.11.  We've been avoiding these like the plague ever since.  Also just noticed this in an AX4-5 system with the same seagate barracuda drives, they account for most if not all of the 1TB failures we see.

No Events found!

Top