Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

102266

August 7th, 2011 01:00

PowerEdge T710 sometimes displays MEMTEST lane failure on boot

At work we have a Dell PowerEdge T710 server.  The issue here is that when it reboots it will randomly display a MEMTEST lane failure on DIMM B1 which shuts off the entire channel (B1, B4, B7).  This reduces the available RAM for the system by 24GB.

I attached a picture below which shows the error.  After four reboots the error went away.  Sometimes the failure doesn't happen but if it does it takes a random number of reboots to fix it.  There seems to be no pattern!

We tried new RAM and a new motherboard but this is still occurring.   Any ideas at all as to the cause would be appreciated.  Thanks!

9.3K Posts

August 7th, 2011 17:00

On the Xeon 55 and 56 series, the memory controller is built into the processor. Your T710 uses this type of processor.

When you get that type of error, the first thought is indeed to try swapping memory and then maybe the motherboard, but after those 2 the processor is a definite suspect.

In the screenshot it shows dimm B1 to be the culprit. If it's always the same 'letter' (B), it means it's the dimms attached to the 2nd processor, and I'd suggest to try replacing that processor. If it changed after replacing the motherboard (e.g. from A to B), it's likely that you switched the processors around (as the motherboard couldn't/shouldn't care which processor is in which socket.

If both "A" and "B" dimms will show with errors, you might be looking at both processors having an issue.

5 Practitioner

 • 

274.2K Posts

August 8th, 2011 08:00

If you are still having the issue, then I suggest swap Processor A and B with each other to help narrow down any possible hardware issues.

Ensure the firmware on the server is up to date.

support.dell.com/.../driverslist.aspx

In the F2 BIOS you can also disable the C-State and ensure the power settings are set to performance/max.

Then test and see if there are any changes.

3 Posts

August 8th, 2011 10:00

Thanks for the ideas and I was starting to suspect the processor and just wanted to hear what others thought.  When we replaced the motherboard we kept the processors in the same location and the problem has always been B1.   I believe you both are correct that it's the processor, it wouldn't make much sense that it could be anything else.  It's funny though that the it will report the problem only sometimes.

72 Posts

August 9th, 2011 23:00

You

1 Message

March 21st, 2012 11:00

Perhaps the issue could be related to what this article describes:

http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/server-pedge-installing-upgrading-memory-11g.pdf

I'd say it's an important read regarding setting up your RAM. I think it indicates that if you have an unbalanced configuration, you can get BIOS errors like the one you had. I ran into the same problem, only it was for slot A1. Here is what I had done. I had 4 x 16 GB DIMMs. I put one in slot A1, one in A2, one in B1, and one in B2. But correct me if I'm wrong, it seems one has to have at least one DIMM in each of the three channels? Please read that pdf and let me know what you think...

No Events found!

Top