This post is more than 5 years old

29 Posts

108365

October 7th, 2011 01:00

PE SC1435 SAS 5iR adaptor failure?

Hi, folks

I am pretty certain that the SAS 5ir Adaptor in our PE server has failed. I have been quoted £250 to replace it. Because of this I would like to make absolutely sure that this is the component that has failed. Here is what happened:

Last Sunday night, our PowerEdge SC1435 (<ADMIN NOTE:Service tag removed per privacy policy>) shut down because of a hardware error. The message on the screen was:

*** Hardware Malfunction
Call your hardware vendor for support
*** The system has halted ***

Restarting the system allows Windows 2003 R2 to boot but the system soon shuts down again. The longest it stayed up for was about 1.5hrs which was enough time for me to run DSET and copy some data off the server.
 
One of the last reboots I tried failed with the following on the BIOS screen:

PCIe Fatal Error interrupt at 9B82:41F6

Pressing 'R' to reboot the system resulted in the system rebooting to Windows but it only lasted a few minutes before the hardware failure kicked in. I have not restarted it since.

The DSET report shows that under Storage > SAS 5_iR Adaptor Embedded that the State is Degraded:

ID        0
Name  SAS 5/iR Adapter
State    Degraded
Firmware         Version 00.10.49.00.06.12.02.00
Minimum Required Firmware Version  00.10.51.00.06.12.05.00
Driver Version  1.25.05.00
Minimum Required Driver Version       1.28.03.01
Storport Driver Version            5.2.3790.3959
Number of Connectors            1
Rebuild Rate    Not Applicable
BGI Rate         Unknown
Check Consistency Rate         Unknown
Reconstruct Rate        Unknown
Security Capable         Not Applicable
Security Key Present   Not Applicable
SCSI Initiator ID          Not Applicable
Cache Memory Size    MB
Patrol Read Mode       Disabled
Patrol Read State        Unknown
Patrol Read Rate         %
Patrol Read Iterations  Unknown


I tried installing the latest firmware update available from the Dell Support site but when I ran Flash.bat from a command prompt several messages appeared stating that it was unable to find the required files.

By the time I tried installing the latest driver the system was not staying up long enough for me to even run the driver installation package.

According to the DSET report, everything else seems OK.
 
There are no errors reported in the event logs before the failure occurs. Device Manager shows everything is fine.

I am a little confused about the terminology used in the DSET report, particularly the 'embedded' description. As far as I understand it, the SAS 5iR is an adaptor card and is shown as 'adaptor' in Device Manager. I understand that the SAS 5iR comes in two forms: an adaptor card and embedded in the system. I assume our SC1435 contains the card and it certainly looks that way - the drives are connected by two leads that terminate in one connector that is attached to the card which sits on a PCI riser.

 
One other thing I am uncertain about is the PCIe error message. I don't know if this relates to the SAS card or to the PCI Riser card which the SAS adaptor is connected to or perhaps another PCI connection on the motherboard.
 
Because there are no errors in the Windows Event Logs, Device Manager shows everything as being OK and because the DSET report shows the adaptor's status as degraded,  I assume that the SAS adaptor has failed.
 
What do you people think? Anything I could try to make certain the card is the point of failure? £250 is a lot to spend if the problem lies elsewhere.
 
Thanks!
 
Mark

October 10th, 2011 08:00

The Perc5 ir should not cost that much money! My company sells those with a 1-year warranty for under $100. (Aventis Systems)

It is kind of rare to see these controllers fail, but when they do, they usually give a PCI-E training error. You will usually also see drives randomly dropping/picking back up. This could explain the rebooting.

29 Posts

October 10th, 2011 07:00

Hello, Daniel

Many thanks for replying.

I reseated everything on the motherboard.

I started the server (the first time since last week), and ran the 32bit diags from a bootable CD and it displayed the following failure:

Test resultys : Fail

Device : IPMI

Test : IPMI_System_Event_Log_Check

Error Code : 2900:0221

Msg : IPMI - Oct 02 22:53:22 2011 : System Firmware :: Critical interrupt sensor (PCIE Fatal Err) Bus Fatal Error

The hard drives and SAS controller were not listed in the available tests.

A restart resulted in another PCIe fatal error - F000:E891. After pressing (r) to reboot the system it booted normally.

Ran the PowerEdge Diagnostics and opted to test everything (SAS 5ir and drives are listed). All tests passed. All entries under the Configuration tab have a green tick mark beside them. Test took 2hrs 22mins - the majority of that was the hard drive tests.

The system has been up for nearly three hours now. Presumably, reseating the components did not help as the PCIe fatal error occurred again.

October 10th, 2011 09:00

No UK office, but we have pretty decent shipping rates. If you have your own account we can use that, too. Just need to call in for a sales rep.

11 Legend

 • 

16.3K Posts

October 10th, 2011 09:00

Agreed ... you can get a PERC 5 (much better controller) for half the quoted cost of the SAS 5.

29 Posts

October 10th, 2011 09:00

Thanks for the feedback. The server is still running so I'll leave it and see if lasts a few days.

@IcanBENCHurCAT:

Do you have a UK office? I was gob-smacked when I was quoted £250. Even buying it from you guys and getting it shipped across the big pond would be cheaper.

29 Posts

November 3rd, 2011 06:00

OK...

Saw the CTRL+C prompt (Whoops)

Now I have a different problem. I used the menu to do the following:

CTRL+C  displays a screen on which SAS6IR is listed. Is this correct? The invoice states it is a PERC 5i/R.

Anyway from the menu I drilled down:

SAS6IR > RAID Properties > Manage Array > Activate Array

I chose to activate the array and exited the utility.

The server reboots and after Initializing.. I see Vol (00:000) is currently in state RESYNCHING

The drive is then identified and listed (two drives in RAID 1 configuration).

After the server BIOS finishes loading, a white progress meter rapidly completes across the bottom of a black screen, the Windows Server 2003 spalsh screen appears and then the server reboots. I get the same when trying to start the OS in safe mode.

Presumably I need to delete the array and start from scratch because the wrong driver is loaded. I am quite happy to do that, but would prefer to avoid it if possible. Is there a way to install the driver without re-installation?

Cheers!

29 Posts

November 3rd, 2011 06:00

I have received a replacement PERC 5 i/R card and have placed it in the system and connected the drives. Problem is, I cannot see how to configure it. During boot the following is displayed:

SAS 6 Host Bus Adaptor BIOS

MPT-6.22.03.00

Copyright 2000 2008 LSI

Initialising...

Vol 00:130 is currently in state inactive/optimal

Enter SAS configuration utility to investigate

The previous card's BIOS displayed a CTRL+key combination to use to launch the configuration utility but nothing is displayed.

Can anyone help with this, please?

Thanks

November 4th, 2011 05:00

Yeah, your driver must be updated. The only way I know how is to take a backup with software like Acronis. Then, you can specify drivers to add when you put the backup back on to the array. Takes about the same amount of time as reinstalling.

Maybe you could put in a wIndows cd repair and add the driver. Or maybe a livecd and add driver?

29 Posts

November 4th, 2011 05:00

Thanks. I'll reinstall from scratch

29 Posts

November 8th, 2011 00:00

Right, a little more help, if I may please :)

I can't find a driver for the adaptor. When I search Dell's site for 'SAS 6/ir driver' all I see are drivers for integrated controllers, not for a SAS 6/iR adaptor, which I assume is different.

Anyone know where I can get one for Windows 2003, please?

Thanks!

29 Posts

November 8th, 2011 01:00

Google is your friend:

 

However, looking at the page you have no idea what this is for. Looking at the text file tells you.

29 Posts

November 11th, 2011 07:00

Thanks to everyone who contributed.

I saved £100 by buying from the US. This saving was reduced by a £30 customs charge when it arrived at our office, but it still meant a saving of £70!

The new card is in - had to use the Dell USB F6 Utility to format my USB flash drive before Windows Setup would recognise it. Everything seems to be OK :)

I appreciated everyone's help :)

10 Posts

July 12th, 2012 11:00

This is an additional note on this thread, for everyone that may read it.

The quickest way to identify if a card has failed on the expansion bus, usually a riser if in a 1U chassis, is to simply removed the suspected card from the PCI bus- don't need to unattached any of its other cables-- just lay it carefully aside so it won't short anything on itself or elsewhere, and won't end up in a fan or block something important.  See if the system boots up past the point without problems.

If it gets this far, you've found the card that is the problem.

Some history-- once you pull a card, look at the little barrel-shaped components with XXXXuF written on the side.  These are called capacitors.  Dell was a victim of a capacitor scam (the great 'Capacitor Plague' of early/mid 2000's - that continues to cause failures up until 2010 and beyond), that may be why the card has failed.

You may also see a black-smudge across the top, where it should normally be silver- this is because electrolye has been vented onto the top of the capacitor, also showing a failure sign.  There are many other signs, but this is likely the cause of the card failure.  Cap overheated due to poor venting, or failed due to poor quality.  Suspect the former first, then ask Dell about the latter-  they may be able to see if you are due a replacement due to OEM defect.

No Events found!

Top