PowerEdge Hardware General

Last reply by 09-07-2015 Solved
Start a Discussion
2 Bronze
2 Bronze
1075415

PowerEdge R720 memory error limit reached

I have a PowerEdge R720 running XenServer 6.2. We have received this error for the second time in the last 3 weeks: "MEM0005 Persistent correctable memory error limit reached for DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8 reseat memory".

Two weeks ago when we got this error the first time, We powered off the server and made sure all the memory was properly seated (it was). We powered the server back on and the error was gone. Since this is a production box requiring HA, we did not take a bunch of time to run extra diagnostic tests.

The error came back this morning. Does anyone have an idea of how we can resolve this issue? Thanks

Solution (1)

Accepted Solutions
63868

WeadonJ,

The error you are receiving is stating that the memory is operational, but is at an early indicator of a possible future uncorrectable error. My suggestion would be to reseat the dimms, then run diagnostics on them to see if there is anything that shows up. 

I would suggest booting to this diagnostic for the memory test. - http://www.dell.com/support/home/us/en/555/Drivers/DriversDetails?driverId=TRWYD&fileId=3332974416&o...

Also, what revision is the BIOS and ESM/Drac sitting at? Are either up to date?

Let me know what you find out.


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!

View solution in original post

Replies (8)
63869

WeadonJ,

The error you are receiving is stating that the memory is operational, but is at an early indicator of a possible future uncorrectable error. My suggestion would be to reseat the dimms, then run diagnostics on them to see if there is anything that shows up. 

I would suggest booting to this diagnostic for the memory test. - http://www.dell.com/support/home/us/en/555/Drivers/DriversDetails?driverId=TRWYD&fileId=3332974416&o...

Also, what revision is the BIOS and ESM/Drac sitting at? Are either up to date?

Let me know what you find out.


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
2 Bronze
2 Bronze
63870

Thanks for the quick reply Chris. Once we get a maintenance window I'll give that diagnostic a shot
2 Bronze
2 Bronze
63870

Sorry, I forgot to include the requested information: BIOS version: 1.6.0 iDRAC Firmware Version: 1.40.40 (Build 17) (Express)
63870

Both BIOS and Drac are a few updates out of date. If you go to this page you can find the updates needed - http://downloads.dell.com/published/Pages/poweredge-r720.html

I would suggest walking the updates up to current and not just jumping to the latest update. 


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
2 Bronze
2 Bronze
63870

I had some time for maintenance today. I updated the BIOS on the server as well as all requested firmware packages. After an install and reboot, I got the following error: Memory device status is critical Memory device location: DIMM_B8 Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded Since that time, the error has cleared. Is there some additional action I need to take? The diagnostic test passed, but I don't like that this error showed up at one point.
63870

WeadonJ,

Swap dimm B8 with another matching dimm in the server. Then boot and see if the error follows the dimm or remains at the slot. If the error follows the dimm then I would replace the dimm. If the error remains at same location, after swapping dimm, then the issue will be in the slot on the motherboard.


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
2 Bronze
2 Bronze
63869

Just to follow up this ticket with the latest details, I had the memory error again last week (Monday). As you suggested, I swapped the RAM that was reported as faulty with one that has been reported as healthy in another slot on the same board. Since that time, there has not been another error. Granted, we are utilizing less memory now than we were then, but even so we usually got the memory error every two days or so. If we do get an error again, I have recorded which slot the module was swapped into, so we'll know if its an issue with the RAM or the slot. Thanks for all of your help.
2 Bronze
2 Bronze
63869

An issue that is being faced by us is the server doesn’t show the full installed RAM Module on Board, and its successfully passes through the booting process without showing any error, like if we have installed 16x20=320 GB RAM it will show sometime 288 GB and sometime 272 GB, the board has also been replaced with new one and BIOS has also been upgraded to latest. we have sent it to various vendors but no one has been able to diagnose it, please help me in this regard.

Latest Solutions
Top Contributor