Unsolved

This post is more than 5 years old

3 Posts

34438

February 10th, 2015 03:00

Poweredge 1950 - Single bit warning error rate exceeded

Hi,

Over the past few days, three of our PE1950's in two separate data centers have reported:

Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded

We do have hundreds of these servers, but I just think it is strange a bit of a strange coincidence for three of them to have the same issue within days of each other.

In each case it is a 4gb module which has been reported as having the issue. In one of the servers that had the issue, DIMM 8 was reported as having the problem, so I swapped DIMM's 7&8 with a pair of known working 2GB DIMM's (not from the same server) - I didn't have any spare 4gb DIMM's to hand and the issue appears to have moved to DIMM 4

All of the servers are on BIOS 2.7.0, two of the affected servers are G2 and the other is G3.

Does anyone have any suggestions as to what could be causing this strange coincidence and the behavior I've noticed while troubleshooting?

Thanks

James 

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

February 10th, 2015 07:00

JamesM85,

I would first look if the server is up to date on the BIOS and Drac specifically. They will both have a hand maintaining stability with the system. As well as handling the error reporting for the server. Since the BIOS is current I would look at the iDrac/BMC, what version is it at currently?

In the future I would suggest swapping the dimm with another dimm in the same server. That lets you see if the error follows the dimm, indicating an issue with the dimm. Or if it stays at the slot, which tells us if the issue is the board.

Lastly, make sure the processor is firmly seated on the connection.

Let me know what you see.

3 Posts

February 10th, 2015 07:00

Sorry, I should have been more clear. I changed DIMM's 7&8 for "new" ones after swapping two of the modules around seemed to make the issue follow the DIMM.

The BIOS and BMC are both uptodate. All of these servers have recently been updated using the latest Dell SUU

I have tried removing and re-seating all of the memory modules in the affected servers. I have not tried the processors yet, as I didn't want to go disturbing too much.

Could it be that the newer BIOS / BMC firmware are causing false / more sensitive reports?

3 Posts

February 13th, 2015 07:00

Does anyone else have any suggestions?

We now have 6 PowerEdge 1950 servers with this issue, within the space of 4 days - to me that is more than a coincidence.

Has anyone else started having these issues? Are there any known firmware / hardware bugs?

No Events found!

Top