Highlighted
weadonj
1 Nickel

PowerEdge R720 memory error limit reached

Jump to solution

I have a PowerEdge R720 running XenServer 6.2. We have received this error for the second time in the last 3 weeks: "MEM0005 Persistent correctable memory error limit reached for DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8 reseat memory".

Two weeks ago when we got this error the first time, We powered off the server and made sure all the memory was properly seated (it was). We powered the server back on and the error was gone. Since this is a production box requiring HA, we did not take a bunch of time to run extra diagnostic tests.

The error came back this morning. Does anyone have an idea of how we can resolve this issue? Thanks

0 Kudos
1 Solution

Accepted Solutions
Moderator
Moderator

RE: PowerEdge R720 memory error limit reached

Jump to solution

WeadonJ,

The error you are receiving is stating that the memory is operational, but is at an early indicator of a possible future uncorrectable error. My suggestion would be to reseat the dimms, then run diagnostics on them to see if there is anything that shows up. 

I would suggest booting to this diagnostic for the memory test. - http://www.dell.com/support/home/us/en/555/Drivers/DriversDetails?driverId=TRWYD&fileId=3332974416&o...

Also, what revision is the BIOS and ESM/Drac sitting at? Are either up to date?

Let me know what you find out.

Chris Hawk

Dell | Social Outreach Services - Enterprise
Get Support on Twitter @DellCaresPro 
Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)

0 Kudos
8 Replies
Moderator
Moderator

RE: PowerEdge R720 memory error limit reached

Jump to solution

WeadonJ,

The error you are receiving is stating that the memory is operational, but is at an early indicator of a possible future uncorrectable error. My suggestion would be to reseat the dimms, then run diagnostics on them to see if there is anything that shows up. 

I would suggest booting to this diagnostic for the memory test. - http://www.dell.com/support/home/us/en/555/Drivers/DriversDetails?driverId=TRWYD&fileId=3332974416&o...

Also, what revision is the BIOS and ESM/Drac sitting at? Are either up to date?

Let me know what you find out.

Chris Hawk

Dell | Social Outreach Services - Enterprise
Get Support on Twitter @DellCaresPro 
Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)

0 Kudos
weadonj
1 Nickel

RE: PowerEdge R720 memory error limit reached

Jump to solution
Thanks for the quick reply Chris. Once we get a maintenance window I'll give that diagnostic a shot
0 Kudos
weadonj
1 Nickel

RE: PowerEdge R720 memory error limit reached

Jump to solution
Sorry, I forgot to include the requested information: BIOS version: 1.6.0 iDRAC Firmware Version: 1.40.40 (Build 17) (Express)
0 Kudos
Moderator
Moderator

RE: PowerEdge R720 memory error limit reached

Jump to solution

Both BIOS and Drac are a few updates out of date. If you go to this page you can find the updates needed - http://downloads.dell.com/published/Pages/poweredge-r720.html

I would suggest walking the updates up to current and not just jumping to the latest update. 

Chris Hawk

Dell | Social Outreach Services - Enterprise
Get Support on Twitter @DellCaresPro 
Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)

0 Kudos
weadonj
1 Nickel

RE: PowerEdge R720 memory error limit reached

Jump to solution
I had some time for maintenance today. I updated the BIOS on the server as well as all requested firmware packages. After an install and reboot, I got the following error: Memory device status is critical Memory device location: DIMM_B8 Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded Since that time, the error has cleared. Is there some additional action I need to take? The diagnostic test passed, but I don't like that this error showed up at one point.
0 Kudos
Moderator
Moderator

RE: PowerEdge R720 memory error limit reached

Jump to solution

WeadonJ,

Swap dimm B8 with another matching dimm in the server. Then boot and see if the error follows the dimm or remains at the slot. If the error follows the dimm then I would replace the dimm. If the error remains at same location, after swapping dimm, then the issue will be in the slot on the motherboard.

Chris Hawk

Dell | Social Outreach Services - Enterprise
Get Support on Twitter @DellCaresPro 
Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)

0 Kudos
weadonj
1 Nickel

RE: PowerEdge R720 memory error limit reached

Jump to solution
Just to follow up this ticket with the latest details, I had the memory error again last week (Monday). As you suggested, I swapped the RAM that was reported as faulty with one that has been reported as healthy in another slot on the same board. Since that time, there has not been another error. Granted, we are utilizing less memory now than we were then, but even so we usually got the memory error every two days or so. If we do get an error again, I have recorded which slot the module was swapped into, so we'll know if its an issue with the RAM or the slot. Thanks for all of your help.
0 Kudos
adeel7454
1 Copper

RE: PowerEdge R720 memory error limit reached

Jump to solution

An issue that is being faced by us is the server doesn’t show the full installed RAM Module on Board, and its successfully passes through the booting process without showing any error, like if we have installed 16x20=320 GB RAM it will show sometime 288 GB and sometime 272 GB, the board has also been replaced with new one and BIOS has also been upgraded to latest. we have sent it to various vendors but no one has been able to diagnose it, please help me in this regard.

0 Kudos