Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

49486

March 8th, 2013 14:00

E1810 hdd 0 fault on good ssd after updating perc6i and backplane firmware and storport

First off.....  Hello, this is my first post in the forums.  I'll post my whole story--you may want to skip to 4th paragraph....

I got tired, dissapointed, and fed up with the really poor storage options out there for the home user after first running out of space on a NAS and finding out there is no reasonable path to migrate to something bigger and then finding out the intel raid stuff on my desktop motherboard doesn't support things like online expansion or volumes over 2tb.  I also have small business server 2003 running on an old PowerEdge 400SC which I must give props to Dell for since it's still humming along reliably but it  will have to give out eventually, right?  I figure it's time to have something else to start replicating the services I use on the small business server over to for when it does give out.

So now I've gotten myself a Gen3 PowerEdge2950 with PERC 6i with SAS backplane and 6 caddies.  I installed a 60 gig ssd into one of the caddies using an alienware ssd to 3.5 adapter and installed to position 0 0 and configured as a single disk raid 0.  I also initially installed 3 2tb sata drives on which I configured a 500 gig raid 5 volume with 64k elements and a 3.2 terabyte raid 5 volume with 1024 elements.  I partitioned the raid volumes with gpt using gparted.  I slipstreemed the perc6 drivers into a Windows 2003 Server Enterprise Edition installation cd and installed onto the ssd.  I got up and running and turned off the paging file and everything worked great.  The lights on the drive 0 caddy didn't light but everything worked great.

After I got my data off the various NAS devices and workstation drives and onto the raid volumes on the 2950 it was time to install the freed up 2 tb drive into the 2950 and expand the raid volumes.  To expand the raid volumes online I found I needed to install Open Manage Server Administrator Node to get the storage manager which I did.  After doing so it told me I should update my perc6i firmware, the backplane firmware, and the storport driver.

I updated the perc6i firmware using the dell update package to 6.3.1-0003.  This required a restart which I performed remotely and the server came rigth back up.  Then I updated the backplane firmware using the dell update package to 2.50.00.  This did not require a restart.  Then I updated the storport driver using the MS hotfix to 5.2.3790.4173.  This required a restart which I performed remotely but the server didn't come right back up so I went down to the garage where I have the noisy beast and the front display was indicating a hdd0 fault.. I didn't pay attention to the exact code at the time I just held the power button till it shut down and then turned it back on.  It booted up fine and is working fine but it still indicates an E1810 hdd0 fault and the lights on hdd0 are flashing amber.

In the open manage server administrator no fault is indicated.  The server is still functioning fine.  In fact I've gone on to delete the 500 gig vd to allow me to expand the main virtual disk, reconstruct the 3.2tb vd to 4.8tb and add a 732gb vd disk to replace deleted 500gig one.  Everything seems great but I figure I better ask for help to clear up the reported error.  Anybody have any ideas?

Moderator

 • 

6.2K Posts

March 8th, 2013 15:00

Hello usfwalden

The errors reported on the front panel are pulled from the hardware log. You will need to clear the hardware log to get rid of the errors. You can boot to the BMC via CTL E or use OMSA to clear the hardware log. There is a log category in OMSA. Under that category is an option for system or controller logs. You should have an option to save or clear the log in that menu. The hardware log may also be referred to as the system or system event log, but it is no related to the Windows system event log.

Thanks

15 Posts

March 8th, 2013 15:00

Thank you,

I cleared the log using omsa and the front panel display has returned to normal.  Out of curiosity what does the flashing amber on the hdd 0 caddy indicate?

Moderator

 • 

6.2K Posts

March 8th, 2013 15:00

Thank you,

I cleared the log using omsa and the front panel display has returned to normal.  Out of curiosity what does the flashing amber on the hdd 0 caddy indicate?

If the light continued flashing amber after clearing the log then there is some form of hard disk drive fault on the drive in slot 0. Typically a flashing amber light indicates that the drive is predictive failure. A drive is marked predictive failure by the SMART on the drive itself. Once the drive hits a certain threshold of bad blocks it will go predictive failure. There are ways to clear the SMART to delete the error, but typically a predictive failure state indicates the drive could fail at any time. Any time a HDD goes pred fail we replace them if they are under warranty.

You can verify if the drive is predictive failure by checking within OMSA. When you are looking at the drive information page there is a column listed as Predictive Failure. It will say Yes or No under this category for each drive. If it lists No then the controller log will need to be viewed to find out what the problem is. You can save the controller log from within OMSA to view it. Under the system information tab should be a drop down option for the controller. One of the options should be export log.

Thanks

15 Posts

March 8th, 2013 16:00

A failure isn't predicted.  The information on the disk from omsa is below:

ID 0:0

Status OK

Name Physical Disk 0:0

State Online

Bus Protocol SATA

Media SSD

Failure Predicted No

Revision 2.25

Capacity 55.38GB

Used RAID Disk Space 55.38GB

Available RAID Disk Space 0.00GB

Hot Spare No

Vendor ID ATA

Product ID OCZ-VERTEX3

Serial No. OCZ-L3IXCN0B6113N7C3

SAS Address 1221000000000000

Moderator

 • 

6.2K Posts

March 8th, 2013 16:00

There are no issues reported in the above information. You will need to look through the controller log to find out why it is reporting a fault condition. The controller log can be difficult to review if you don't know what all of the information means. If you want me to look over it then upload it somewhere and provide a link.

15 Posts

March 8th, 2013 16:00

Thanks I was about to try to look through it but it is extensive....

I changed the extension to .txt and posted it to:  www.ecutune.com/.../lsi_0308.txt

Moderator

 • 

6.2K Posts

March 8th, 2013 17:00

Well the good news is that there is nothing wrong with the drive. The bad news is that you cannot get rid of the error. It is being marked as a bad element by the controller because of a communication issue. It is marked as a non-certified drive on the controller, and that is why the amber light is flashing. This is the section of the log that indicates this:

T49: SES_BackplaneMapping: Undetected device on enclPd 20 StsCode = 5 elmtType 17 elmtIndex 0 slotPd =0 SasAddr =1221000000000000
T49: SES_MarkBadElement: enclPd 20 timeDiff 0 slot 0 badElmt 8 retryCnt 1 oldTime:31 currentTime:31 
T49: EVT#03386-T49: 236=PD 00(e0xff/s0) is not a certified drive

Thanks

15 Posts

March 8th, 2013 17:00

Ok, thanks.

No Events found!

Top