Unsolved

This post is more than 5 years old

3 Posts

76370

September 7th, 2005 21:00

PV220 drives dropping out

Just wondering if anybody has been seeing an abnormally high occurrence of drive failures on PV220s coupled with Perc4/di cards? We purchased six PV220s and put them into production less than 2 months ago, and 4 of them have had drives dropping out and spontaneous array re-builds, on more than one occasion several drives simultaneously dropping out. So far 12 drives or so out of about 100 and counting in 2 months of production. Sometimes its corrupted databases with syslogs indicating hardware failures, but array manager saying all ok. All atmospherics and power is fine. Firmware is the latest all around. Servers are Dell 2850s. Dell support is unable to help.
 
Anybody with similar experiences out there?
 
~Dave

6 Operator

 • 

9.3K Posts

September 8th, 2005 11:00

This is a little off topic, but if you have a need for that much storage space (6 PV220S units), maybe you should consider looking into buying a SAN (Dell|EMC CX500 or so).

3 Posts

September 8th, 2005 12:00

I agree, but the units are each owned by different customers we are managing so it's not an option.

 

September 13th, 2005 22:00

We had problems like this and it was traced to a bad back plane on the SCSI controller, is it normally the same drive slots that die? Are you using Seagate drives?

7 Posts

September 14th, 2005 10:00

The organisation I work for have a number of 2850's connected to 220's using both the PERC4e/DC (PCIe) and PERC4/DC (PCIX) cards. We have had a number off serious issues with RAID5 + 1 hot spare volumes being corrupted due to failure of HDDs. We primarily use split backplane 220's and utilise 146GB HDDs. We have used both configrations of 15K drive speeds and 10K drive speeds (but not mixed togehter).
On 1 system we have now had more than 4 issues with drive failures causing loss of RAID5 arrays. We have had dell techs on site 3 times trying to fix this issue. The last tech visit, they updated our server to the latest and greatest firmwares and driver combinations. Dell have also replaced the entire 220 with HDDs, backplanes, cables as well as everything in the 2850 apart from the physical chasis, power supplies and memory modules.
So in short, yes, I have noticed a high number of drive failures combined with the 2850's/PERC4 cards.
 
We have many 2650's with the older PERC3 cards and have not had these issues, although we have experienced the odd drive failure.
 

3 Posts

September 14th, 2005 19:00

Hey Guys,

Having the same problem with our 2850 & PV220s.  It worked fine on the perc4e/dc when the PV220s was in joined bus mode with raid 5 across the whole cabinet with one hotspare, but now It's dropped two different drives in this new mode.  The current setup is the perc4e/di is mirrored, the PV220s is in split mode with half of the cabinet on each channel of the perc4e/dc with drive 0 & 15 as hot spares running RAID 50.  Drive 5 is the latest one that has failed.  I checked in the Card bios and it reports no media errors.  The drives are 10k Seagates ST3146807LC REV DS09.   I have the latest firmware on the Card, drives & enclosure.  I'm also running the latest drivers & latest Dell Openmanage.

Later,

Sean

Message Edited by seanwales on 09-14-2005 03:46 PM

September 14th, 2005 21:00

I sent two of my drives to Ultratec Limited in the UK and they said both drives has a SMART trip. Some problems over the buss, however using just the drives they could not pin point why these drives had failed....... It was not the drive but the rest of the hardware that caused the faul. Download Seagate Enterprise and run a full DST test on the failures ;)

6 Posts

September 29th, 2005 00:00

hi.
I have the same problem with our 2850 & PV220s.
 
Two or more disks fail in PV220S + PE2850 + PERC4e/dc .
The Drive are FUJITSU MAT3300NC   revision 5704.
PE2850 connected PERC4e/DC,Perc4/DC,and two PV220S.
Dell support replaced BKPRN,ZENN,SBUS,Cable.
 
 
but... multiple disk failer occurred three times in a month.
 
PERC4e/DC seems to be a cause according to Dell.
They suggest that replace PERC4e/DC with PERC4/DC.
 
 
 

 
 

Message Edited by hamigaki on 09-28-2005 08:48 PM

Message Edited by hamigaki on 09-30-2005 02:42 AM

6 Operator

 • 

1.8K Posts

September 29th, 2005 16:00

"Download Seagate Enterprise and run a full DST test on the failures."
I have had multiple scsi drives fail in arrays, then tested with the Enterprise util... they pass running it multiple times, then fail once replaced into an array. Replaced those particular drives with new retail drives, the arrays are running without problems.  Pretty useless test. 
As a note, I lost an FSMO raid 5 array by flashing a drive firmware with the util. Works on raid 1 drives though.
 

"Paul,

Normally a firmware upgrade does not cause any issues. Unfortunately in your case it seems to have caused more problems than it fixed. You can contact our warranty department for information on getting the drives replaced. There phone # is 1-800-468-3472 or you can set up a replacement through our website @ http://support.seagate.com/customer/warranty_validation.jsp

Robert H.

Seagate Technical Support"

 

"PERC4e/DC seems to be a cause according to Dell.
They suggest that replace PERC4e/DC with PERC4/DC."
 
Strange,  I have multiple client's with this adapter, no problems, and on multiple forums there are no complaint's... with the adapters connected to standard backplanes. Lsilogic must be purposely adding code to the adapter bios to mess with the 220s.  they must have some pretty nasty people working for them    :smileyhappy:

Message Edited by pcmeiners on 09-29-2005 02:02 PM

6 Posts

September 30th, 2005 06:00

hi
 
It agrees the possibility that PV220's quality is bad .
However, I should operate PV220S with stability. :smileysad:
 
Isn't there good idea?
:smileysad:
 

 

7 Posts

October 1st, 2005 12:00

Have now had 4 of 6 PowerEdge 2850s with 220s configurations fail and am now up to 8 failures in all on these 4 systems. Most of the configurations were RAID5 but the last one was a RAID10 on 10 disks with 2 hot spares.
 
Pretty frustrating for me and Dell! :smileymad:
 
Dell have suggested that I might like to try rebuilding the entire system from scratch. Initialze all disks and rebuild the OS and restore the data. I told them that this would be nice but is not practical. The 6 systems include 111 Disks in total (almost 14.5TBs).
 
They have also suggested that it may be a problem only with the PERC4e/DCs (i.e. PCIe cards not the PCIX cards). I have not tested this.
 
We have lost data 3 times in all.

6 Operator

 • 

1.8K Posts

October 2nd, 2005 15:00

Cap72, how often is the consistency check run on the arrays ? Most of my clients are small businesses so I am lucky enough to be able to schedule them to run weekly in off hours, makes a big difference.
 
 

7 Posts

October 2nd, 2005 20:00

We attempt to run the consistency checks weekly, althoug I think they happen more often fortnightly.
Although I have noticed that it has found a couple of issues, it did not help with the last 3 failures that we have experienced. In the last instance, the consistency check had completed on that volume (RAID50) less then 24Hrs before the failure.
 
I'm leaning towards changing our backplane from split to single. This makes some sense. I think Dell are guessing with the changing the PCIe PERC cards for PCIX PERC cards. From what Dell have told me, both the PCIe and PCIX cards use the same chipsets but the integrated PERC4i is very different.

6 Operator

 • 

1.8K Posts

October 2nd, 2005 22:00

The Perc 4E and Perc embedded use the same raid chip IOP332, the difference would be the Perc4E DC uses Lsilogic SCSI chips on board the raid card for the scsi interface versus the embedded using the motherboard Adaptec scsi interfaces. I surmise the problem is the Dell raid backplanes  are interfaced with a much too complicated system management system . I have Supermicro backplanes on my larger array, very simple backplanes, not much to go wrong.  
 
Bi monthly CCs are sufficient, some users go for the Dell monthly recommendation, or only if problems occur.
 

Message Edited by pcmeiners on 10-02-2005 06:15 PM

6 Posts

October 3rd, 2005 04:00


DELL released BIOS: Dell Server System BIOS, English, PowerEdge 2850, A04.
Fixes/Enhancements:
 Added workaround for lockup resulting from the systems with 8GB RAM or more
and RAID storage controller potentially claiming inappropriate addresses.
Does this FIX have the relation?

7 Posts

October 4th, 2005 08:00

Can't speak about A04 firmware update for the PowerVault 220s, as I am not sure if any of our systems were running it. However, I can say that both the A05 & A06 firmwares did not help, as both arrays had problems.
No Events found!

Top