Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1082150

April 7th, 2014 08:00

PowerEdge R720 memory error limit reached

I have a PowerEdge R720 running XenServer 6.2. We have received this error for the second time in the last 3 weeks: "MEM0005 Persistent correctable memory error limit reached for DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8 reseat memory".

Two weeks ago when we got this error the first time, We powered off the server and made sure all the memory was properly seated (it was). We powered the server back on and the error was gone. Since this is a production box requiring HA, we did not take a bunch of time to run extra diagnostic tests.

The error came back this morning. Does anyone have an idea of how we can resolve this issue? Thanks

Moderator

 • 

8.4K Posts

April 7th, 2014 11:00

WeadonJ,

The error you are receiving is stating that the memory is operational, but is at an early indicator of a possible future uncorrectable error. My suggestion would be to reseat the dimms, then run diagnostics on them to see if there is anything that shows up. 

I would suggest booting to this diagnostic for the memory test. - http://www.dell.com/support/home/us/en/555/Drivers/DriversDetails?driverId=TRWYD&fileId=3332974416&osCode=CXS03&productCode=poweredge-r720&languageCode=EN&categoryId=DI

Also, what revision is the BIOS and ESM/Drac sitting at? Are either up to date?

Let me know what you find out.

8 Posts

April 7th, 2014 14:00

Thanks for the quick reply Chris. Once we get a maintenance window I'll give that diagnostic a shot

8 Posts

April 7th, 2014 15:00

Sorry, I forgot to include the requested information: BIOS version: 1.6.0 iDRAC Firmware Version: 1.40.40 (Build 17) (Express)

Moderator

 • 

8.4K Posts

April 8th, 2014 05:00

Both BIOS and Drac are a few updates out of date. If you go to this page you can find the updates needed - http://downloads.dell.com/published/Pages/poweredge-r720.html

I would suggest walking the updates up to current and not just jumping to the latest update. 

8 Posts

April 9th, 2014 16:00

I had some time for maintenance today. I updated the BIOS on the server as well as all requested firmware packages. After an install and reboot, I got the following error: Memory device status is critical Memory device location: DIMM_B8 Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded Since that time, the error has cleared. Is there some additional action I need to take? The diagnostic test passed, but I don't like that this error showed up at one point.

Moderator

 • 

8.4K Posts

April 10th, 2014 06:00

WeadonJ,

Swap dimm B8 with another matching dimm in the server. Then boot and see if the error follows the dimm or remains at the slot. If the error follows the dimm then I would replace the dimm. If the error remains at same location, after swapping dimm, then the issue will be in the slot on the motherboard.

8 Posts

April 22nd, 2014 08:00

Just to follow up this ticket with the latest details, I had the memory error again last week (Monday). As you suggested, I swapped the RAM that was reported as faulty with one that has been reported as healthy in another slot on the same board. Since that time, there has not been another error. Granted, we are utilizing less memory now than we were then, but even so we usually got the memory error every two days or so. If we do get an error again, I have recorded which slot the module was swapped into, so we'll know if its an issue with the RAM or the slot. Thanks for all of your help.

5 Posts

September 7th, 2015 04:00

An issue that is being faced by us is the server doesn’t show the full installed RAM Module on Board, and its successfully passes through the booting process without showing any error, like if we have installed 16x20=320 GB RAM it will show sometime 288 GB and sometime 272 GB, the board has also been replaced with new one and BIOS has also been upgraded to latest. we have sent it to various vendors but no one has been able to diagnose it, please help me in this regard.

No Events found!

Top