This post is more than 5 years old

1 Rookie

 • 

3 Posts

139546

February 17th, 2014 11:00

uncorrectable ecc memory error

Hi


We have a T605 server (2009 vintage) running Windows Server 2003 that started crashing today.  The server has 2 quad-core opterons with 1GB ECC DIMMs in slots A1, A2, B1 and B2 (giving 4GB total, with 2GB "local" to each processor socket).


Loading the diagnostic utility showed an error message saying there was an uncorrectable ECC error affecting DIMM slots A1 & A2.

I first tried removing and then re-seating the memory sticks in DIMM slots A1 & A2.  This didn't work and the server crashed again when starting windows.

I then ran the memory diagnostic from the bios utility menu (express version).  The diagnostic completed without any errors, but the server again crashed when trying to boot into windows.

To see if the memory stick(s) themselves were the problem, I removed both DIMMs from slots A1 & A2, and took the DIMM from B2 and put it in A1.  The server again crashed on start up, and this time the logged error message said there was an ecc error affecting slot A1 only.

Finally, I put the DIMM from A1 back in B2 where it came from, and left all of socket A's memory slots unpopulated.  The server then booted in windows normally and has been up for several hours since.

So, it looks like the problem isn't the memory sticks themselves.  Maybe it's a motherboard issue, or even a memory controller problem on the processor.

Can anyone suggest what else might cause this problem, and what else I can do to troubleshoot?

Thanks.

990 Posts

February 17th, 2014 12:00

NICK.BAIRD,

It looks like you have gone through the necessary diagnostic procedures and narrowed it down to the slot. With that in mind, the board is most likely the problem and would need to be replaced to correct the issues on the A bank. 

Regards,

 

990 Posts

February 17th, 2014 12:00

The problem could stay with the A slots and never migrate. But in server terms, the expected system life is 5 years, so you may be on the right track to upgrade and migrate to a newer, faster server.. I would make sure my data backup is current until you do make a swap.  For my money, I would invest in the newer technology.

Regards,

1 Rookie

 • 

3 Posts

February 17th, 2014 12:00

Thanks again for your quick response.

1 Rookie

 • 

3 Posts

February 17th, 2014 12:00

Thanks Geoff.


In your opinion, does this problem indicate that a more catastrophic failure of the motherboard is imminent?

The server is around 5 years old , which is hardly ancient but also far from new.  My inclination is to buy a new server and migrate everything over.  I'm just wondering how long I've got.....

0 events found

No Events found!

Top