Unsolved

This post is more than 5 years old

136139

February 3rd, 2003 15:00

Should server pass memtest?

I just recieved my new PowerEdge 1650 and as with all new machines, I ran MemTest86 on it. To my surprise it failed on an address close to the end of my memory (somewhere around 255.8Mb out of 256Mb total). Is this a known issue or should I call dell to have my memory replaced?

1 Message

February 4th, 2003 19:00

Have them replace the bad module.

February 4th, 2003 20:00

The wierd thing is that when we ran memtest on another machine using the SDRAMs from our server, they passed without any problems whatsoever. Can someone verify wether that memtest86 does report problems with the memory or not on a PowerEdge 1650 server?

2 Posts

April 3rd, 2003 15:00

I am having the same problem on a Dell PowerEdge 600SC. When Irun Memtest86 w/ the original 128MB memory module that Dell provided, I get continuous errors on the 127.8MB area. If I put in a 512MB module that I purchased through crucial.com, I get errors at 511.8MB. Something is definately either a) wrong w/ the motherboard or b) wrong w/ memtest86. Now, I've used memtest86 on other PowerEdge systems, so I'm leaning towards a broken MB.

Have you been able to resolve your problem? Did Dell replace the motherboard?

Thanks,

Louis

1 Message

April 22nd, 2003 08:00

Yeah, I am getting this too with my 600SC. The 128MB stick from Dell reports multiple errors at the same address, 07ffdc80. When I use two 512MB sticks from Crucial, it reports multiple errors at 3fffdc80. That address must be reserved for something?

I happened to order two 600SC's so I'm going to try memtest86 on the other tomorrow, but based on what's being reported in this thread, I can probably expect same. Perhaps this a motherboard design issue? I wonder if this affects anything.

Are you getting memory failures at same address?

 

2 Posts

April 22nd, 2003 13:00

I'm not sure what the hex address was, but it was at essentially: 127.8 MB in the RAM.

I decided to go ahead and give Dell the benefit of the doubt here. I ran through Dell's diagnostics and all passed. And then I fired up the machine w/ Win2k, and burnt it in for several days running Seti@Home, UD Cancer Research, and a few programs to leak memory to make sure i was moving through the various bits of ram. Certainly not scientific, but not a bad indication of stability either.

The results were that the machine was just fine.... we've since moved one of our engineers onto the machine, and he regularly exercises the heckout of it using Visual Studio .NET as well as test installations of our product. No crashes to date..

My guess is that the way this motherboard is built and the way memtest86 tests the ram are in conflict. I originally thought it was some sort of fundamental MB flaw... but have since come ot the conclusion that its just memtest86's inability to test this type of MB+Ram combo properly.. which was surprising since I've never had a problem w/ memtest86... including w/ other poweredge 1600SC systems. Go figure.

 

Louis

February 5th, 2004 16:00

I'm seeing this memtest issue with 4 brand new PE2650. There's also a discussion of it going on at ArsTechnica, where the problem went away for one user after a motherboard swapout.

I know the RAM is not bad in these systems, and the problem always occurs at the last 200K of RAM, whether there's 1, 2 or 4GB in the system. I suspect the errors are a symptom of a motherboard problem. I have an open case with Dell - we'll see what their response is.

 

33 Posts

February 6th, 2004 04:00

Gentleman, I found some interesting information on the memtest86 site. It actually states that most errors are legitimate but can be caused by other factors as well. It also states alot of vendors like to argue that there chipset is not being test  properly with memtest86, there by throwing up errors, where as memtest86 is supposed to be a universal test. But it does make one interesting point, saying ECC chipsets actually do require special configuration of memtest86 for the chipset to have valid test some of the time, please read below, BOLD is where it is mentioned. Also posting link for memtest86 which goes into great depth about their error reporting, hope it helps.

http://www.memtest86.com/

" am often asked about the reliability of errors reported by Mestest86. In the vast majority of cases errors reported by the test are valid. There are some systems that cause Memtest86 to be confused about the size of memory and it will try to test non-existent memory. This will cause a large number of consecutive addresses to be reported as bad and generally there will be many bits in error. If you have a relatively small number of failing addresses and only one or two bits in error you can be certain that the errors are valid. Also intermittent errors are almost without exception valid. Frequently memory vendors question if Memtest86 supports their particular memory type or a chipset. Memtest86 is designed to work with all memory types and all chipsets. Only support for ECC requires knowledge of the chipset. "

 

All the chipsets or servers mentioned i.e. PE600, 2650, etc all use ECC memory so keep that in mind.

Message Edited by jamesm512 on 02-06-2004 12:54 AM

Message Edited by jamesm512 on 02-06-2004 12:55 AM

February 6th, 2004 12:00

That's interesting stuff. I'll re-run memtest on one of my boxes, but I'm pretty sure that it's failing on a small number of addresses...

and if it was a problem with ECC chipsets, then why would memtest report no errors after a motherboard replacement? That's the one thing that convinces me it's not a problem with memtest or a 'works as intended' bit of motherboard design on Dell's part.

EDIT: Memtest has many config options, including allowing you to specify if you're dealing with ECC memory. I've played around with it a bit, and even allowing for ECC it reports errors consistantly at the top 200K of RAM.

Message Edited by MightyTribble on 02-06-2004 08:43 AM

February 10th, 2004 18:00

Well, Dell sent a tech to swap out the mainboard on one of my poweredge 2650. Guess what? Memtest no longer fails!

It's the same motherboard rev (00) and the same bios (A15) as the old board ... same RAM, obviously. But memtest runs without a hitch - no errors, and no lockups after a few minutes. 

I'm convinced there's something wrong with a production batch of mainboards, and that the memtest failure is a symptom of the problem. I don't know if this problem would actually affect the server in production, but I do know that the symptoms disappear if you replace the mainboard.

1 Message

March 18th, 2004 15:00

I was reading this thread because I'm having a issue with a 600SC with 512MB of RAM and it also fails the memtest86 at 511.8MBs.  I have called Dell and they pointed me to a memory diagnostic program on there support site called MPmemory.  I ran that diagnostic program without a hitch but the interesting thing is the program checks the amount of RAM the BIOS is reporting and comes up correctly at 512MB but then the MPmemory program only checks 510MB of the memory.  It says it plain as day on the screen.  That leads to one of two conclusions:

One, the last 2 megs of memory is reserved for some reason by Dell and this is the reason for the memtest86 errors

or Two,  Dell is trying to cover up an error or design flaw in the motherboard and has rigged the diagnostic program no to test the last two megs of memory.

The issue I am having with my 600SC running Linux is that it crashs every 3 or 4 days.  It only started a month ago but system just stops.  It is on when physically come in to check it but there is nothing on the screen.  And the many log files Linux keeps show nothing just stop.

Thought this would be of interest to some.

7 Posts

November 19th, 2005 11:00

Did anyone ever get to the bottom of this & find out what Dell is actually using the top 2MB of memory for? I've just bought four Poweredge 1600SC servers from a UK dealer & they all show this problem when tested with the latest version of MemTest86 & MemTest86+. I also have two PE1500 servers that have been in stock for a while. I've tested one of them & it has the problem as well. The PE16000's are due to go to customers in early December & I'd like to be sure that they are going to work reliably.

ADW

July 12th, 2007 17:00

This problem is also present in some PowerEdge 2550s. I have two PowerEdge 2550s. When I run Dell's memory test program, mpmemory, it says:
3072 MB isntalled 4 dimms
3071 MB available via BIOS
3054 MB selected for testing

On the 2550 w/o the problem it says:
3072 MB isntalled 4 dimms
3071 MB available via BIOS
3070 MB selected for testing

Not surprisingly, memtest86 fails on the top bit of memory on the first system. The Linux kernel reports a hardware problem with memory. The "spew" program (when run against) a large disk says that the disk is bad but it does so in an inconsistent manner that suggests that it's its own bookkeeping that is wrong.

I have looked but been unable to find any Dell configuration utility that can modify the amount of "reserved" memory or make any other difference for this. Has anyone found a fix other than swapping motherboards?

Thanks,
-Dave

1 Message

April 23rd, 2008 23:00

Same problem with latest memtest86 and memtest86+ on a Dell PowerEdge 1750 server with the latest A12 BIOS, showing errors at 1023 Mb on a server with 1024 Mb ECC RAM. The same address also gave errors with Microsoft's "Windows Memory Diagnostic" tool (note: all of these memory testing tools are on the "Ultimate Boot CD" ). After a bit of testing with these utilities, the machine would sometimes lock up hard. Swapping the two 512 Mb RAM modules made no difference to the problem. The "Dell diagnostics" did not report a memory problem, but the test ran for only 2 or 3 seconds, and ran in Windows, so I do not think "Dell diagnostics" is a useful memory testing program.


I too was searching for a solution because this machine is I'm sure out of warranty. After searching I eventually found a vague reference to the BIOS USB support being a possible problem. So I tried this on bootup:


Boot -> F2 -> enter bios -> Integrated Devices -> USB controller -> change from "on with BIOS support" to "on without BIOS support".

 

Then rerun memtest, for me this seems to fix it in memtest, plus it fixes it for Microsoft's "Windows Memory Diagnostic" tool. The A11 BIOS log description said "Fixed potential data corruption using BIOS USB support" but I don't think they fully fixed it, so best to just disable it. This experience just reinforced my trust of memtest in vast preference to the Dell-provided tool.

 

Given that it stops the problem for me on the Dell PowerEdge 1750, and given that the problem I experienced sounds identical to the problem described above for various Dell servers (the Poweredge 1600SC, PowerEdge 2550, PowerEdge 1650, PowerEdge 600SC, PowerEdge 2650 are all listed in this thread), then I would suggest trying the same on any Dell PowerEdge servers with this problem, and letting us know if it also fixes the problem for those boxes.

 

-- All the best,

Nick.

Message Edited by nickpj on 04-24-2008 10:53 AM

1 Message

May 4th, 2008 02:00

Thanks Nick! I'd picked up what seemed a perfectly excellent PowerEdge 650 and then it failed memtest, windiag etc. Read a lot of email from 2004 about similar problems with no solutions. Tried this fix "On without BIOS support" and magic, everything passes. Thanks very much.

2 Posts

October 13th, 2009 15:00

This is an old thread I know, but wanted to confirm that the USB BIOS change suggested also solves the problem on my 600SC servers.

0 events found

No Events found!

Top