This post is more than 5 years old
29 Posts
1
108365
October 7th, 2011 01:00
PE SC1435 SAS 5iR adaptor failure?
Hi, folks
I am pretty certain that the SAS 5ir Adaptor in our PE server has failed. I have been quoted £250 to replace it. Because of this I would like to make absolutely sure that this is the component that has failed. Here is what happened:
Last Sunday night, our PowerEdge SC1435 (<ADMIN NOTE:Service tag removed per privacy policy>) shut down because of a hardware error. The message on the screen was:
*** Hardware Malfunction
Call your hardware vendor for support
*** The system has halted ***
Restarting the system allows Windows 2003 R2 to boot but the system soon shuts down again.
The longest it stayed up for was about 1.5hrs which was enough time for me to run DSET and copy some data off the server.
One of the last reboots I tried failed with the following on the BIOS screen:
PCIe Fatal Error interrupt at 9B82:41F6
Pressing 'R' to reboot the system resulted in the system rebooting to Windows but it only lasted a few minutes before the hardware failure kicked in. I have not restarted it since.
The DSET report shows that under Storage > SAS 5_iR Adaptor Embedded that the State is Degraded:
ID 0
Name SAS 5/iR Adapter
State Degraded
Firmware Version 00.10.49.00.06.12.02.00
Minimum Required Firmware Version 00.10.51.00.06.12.05.00
Driver Version 1.25.05.00
Minimum Required Driver Version 1.28.03.01
Storport Driver Version 5.2.3790.3959
Number of Connectors 1
Rebuild Rate Not Applicable
BGI Rate Unknown
Check Consistency Rate Unknown
Reconstruct Rate Unknown
Security Capable Not Applicable
Security Key Present Not Applicable
SCSI Initiator ID Not Applicable
Cache Memory Size MB
Patrol Read Mode Disabled
Patrol Read State Unknown
Patrol Read Rate %
Patrol Read Iterations Unknown
I tried installing the latest firmware update available from the Dell Support site but when I ran Flash.bat from a command prompt several messages appeared stating that it was unable to find the required files.
By the time I tried installing the latest driver the system was not staying up long enough for me to even run the driver installation package.
According to the DSET report, everything else seems OK.
There are no errors reported in the event logs before the failure occurs. Device Manager shows everything is fine.
I am a little confused about the terminology used in the DSET report, particularly the 'embedded' description. As far as I understand it, the SAS 5iR is an adaptor card and is shown as 'adaptor' in Device Manager. I understand that the SAS 5iR comes in two forms: an adaptor card and embedded in the system. I assume our SC1435 contains the card and it certainly looks that way - the drives are connected by two leads that terminate in one connector that is attached to the card which sits on a PCI riser.
One other thing I am uncertain about is the PCIe error message. I don't know if this relates to the SAS card or to the PCI Riser card which the SAS adaptor is connected to or perhaps another PCI connection on the motherboard.
Because there are no errors in the Windows Event Logs, Device Manager shows everything as being OK and because the DSET report shows the adaptor's status as degraded,
I assume that the SAS adaptor has failed.
What do you people think? Anything I could try to make certain the card is the point of failure? £250 is a lot to spend if the problem lies elsewhere.
Thanks!
Mark
No Events found!


IcanBENCHurCAT
31 Posts
1
October 10th, 2011 08:00
The Perc5 ir should not cost that much money! My company sells those with a 1-year warranty for under $100. (Aventis Systems)
It is kind of rare to see these controllers fail, but when they do, they usually give a PCI-E training error. You will usually also see drives randomly dropping/picking back up. This could explain the rebooting.
Mark Dymond
29 Posts
0
October 10th, 2011 07:00
Hello, Daniel
Many thanks for replying.
I reseated everything on the motherboard.
I started the server (the first time since last week), and ran the 32bit diags from a bootable CD and it displayed the following failure:
Test resultys : Fail
Device : IPMI
Test : IPMI_System_Event_Log_Check
Error Code : 2900:0221
Msg : IPMI - Oct 02 22:53:22 2011 : System Firmware :: Critical interrupt sensor (PCIE Fatal Err) Bus Fatal Error
The hard drives and SAS controller were not listed in the available tests.
A restart resulted in another PCIe fatal error - F000:E891. After pressing (r) to reboot the system it booted normally.
Ran the PowerEdge Diagnostics and opted to test everything (SAS 5ir and drives are listed). All tests passed. All entries under the Configuration tab have a green tick mark beside them. Test took 2hrs 22mins - the majority of that was the hard drive tests.
The system has been up for nearly three hours now. Presumably, reseating the components did not help as the PCIe fatal error occurred again.
IcanBENCHurCAT
31 Posts
0
October 10th, 2011 09:00
No UK office, but we have pretty decent shipping rates. If you have your own account we can use that, too. Just need to call in for a sales rep.
theflash1932
11 Legend
•
16.3K Posts
0
October 10th, 2011 09:00
Agreed ... you can get a PERC 5 (much better controller) for half the quoted cost of the SAS 5.
Mark Dymond
29 Posts
0
October 10th, 2011 09:00
Thanks for the feedback. The server is still running so I'll leave it and see if lasts a few days.
@IcanBENCHurCAT:
Do you have a UK office? I was gob-smacked when I was quoted £250. Even buying it from you guys and getting it shipped across the big pond would be cheaper.
Mark Dymond
29 Posts
0
November 3rd, 2011 06:00
OK...
Saw the CTRL+C prompt (Whoops)
Now I have a different problem. I used the menu to do the following:
CTRL+C displays a screen on which SAS6IR is listed. Is this correct? The invoice states it is a PERC 5i/R.
Anyway from the menu I drilled down:
SAS6IR > RAID Properties > Manage Array > Activate Array
I chose to activate the array and exited the utility.
The server reboots and after Initializing.. I see Vol (00:000) is currently in state RESYNCHING
The drive is then identified and listed (two drives in RAID 1 configuration).
After the server BIOS finishes loading, a white progress meter rapidly completes across the bottom of a black screen, the Windows Server 2003 spalsh screen appears and then the server reboots. I get the same when trying to start the OS in safe mode.
Presumably I need to delete the array and start from scratch because the wrong driver is loaded. I am quite happy to do that, but would prefer to avoid it if possible. Is there a way to install the driver without re-installation?
Cheers!
Mark Dymond
29 Posts
0
November 3rd, 2011 06:00
I have received a replacement PERC 5 i/R card and have placed it in the system and connected the drives. Problem is, I cannot see how to configure it. During boot the following is displayed:
SAS 6 Host Bus Adaptor BIOS
MPT-6.22.03.00
Copyright 2000 2008 LSI
Initialising...
Vol 00:130 is currently in state inactive/optimal
Enter SAS configuration utility to investigate
The previous card's BIOS displayed a CTRL+key combination to use to launch the configuration utility but nothing is displayed.
Can anyone help with this, please?
Thanks
IcanBENCHurCAT
31 Posts
0
November 4th, 2011 05:00
Yeah, your driver must be updated. The only way I know how is to take a backup with software like Acronis. Then, you can specify drivers to add when you put the backup back on to the array. Takes about the same amount of time as reinstalling.
Maybe you could put in a wIndows cd repair and add the driver. Or maybe a livecd and add driver?
Mark Dymond
29 Posts
0
November 4th, 2011 05:00
Thanks. I'll reinstall from scratch
Mark Dymond
29 Posts
0
November 8th, 2011 00:00
Right, a little more help, if I may please :)
I can't find a driver for the adaptor. When I search Dell's site for 'SAS 6/ir driver' all I see are drivers for integrated controllers, not for a SAS 6/iR adaptor, which I assume is different.
Anyone know where I can get one for Windows 2003, please?
Thanks!
Mark Dymond
29 Posts
0
November 8th, 2011 01:00
Google is your friend:
However, looking at the page you have no idea what this is for. Looking at the text file tells you.
Mark Dymond
29 Posts
0
November 11th, 2011 07:00
Thanks to everyone who contributed.
I saved £100 by buying from the US. This saving was reduced by a £30 customs charge when it arrived at our office, but it still meant a saving of £70!
The new card is in - had to use the Dell USB F6 Utility to format my USB flash drive before Windows Setup would recognise it. Everything seems to be OK :)
I appreciated everyone's help :)
jester
10 Posts
0
July 12th, 2012 11:00
This is an additional note on this thread, for everyone that may read it.
The quickest way to identify if a card has failed on the expansion bus, usually a riser if in a 1U chassis, is to simply removed the suspected card from the PCI bus- don't need to unattached any of its other cables-- just lay it carefully aside so it won't short anything on itself or elsewhere, and won't end up in a fan or block something important. See if the system boots up past the point without problems.
If it gets this far, you've found the card that is the problem.
Some history-- once you pull a card, look at the little barrel-shaped components with XXXXuF written on the side. These are called capacitors. Dell was a victim of a capacitor scam (the great 'Capacitor Plague' of early/mid 2000's - that continues to cause failures up until 2010 and beyond), that may be why the card has failed.
You may also see a black-smudge across the top, where it should normally be silver- this is because electrolye has been vented onto the top of the capacitor, also showing a failure sign. There are many other signs, but this is likely the cause of the card failure. Cap overheated due to poor venting, or failed due to poor quality. Suspect the former first, then ask Dell about the latter- they may be able to see if you are due a replacement due to OEM defect.