Unsolved

This post is more than 5 years old

3 Posts

9408

October 11th, 2008 17:00

6650 CPU IERR

I get this error when trying to install 2k8 server, ESX 3.5, or when trying to install various versions of linux.  Windows 2003 installed but when it boots it gets this error when starting.

I removed all proc's except for the cpu in socket 1 and that did not solve it, I swapped it for another, changed VRM's.  Nothing.  Pulled out DRAC incase that was causing an issue, but nothing there either.  Flashed to latest BIOS and ESM and the problem still persists.  Looking for next step suggestions.  

 

Specs:

x4 Xeon 2.0ghz 

16GB Ram

PERC3(no other pci cards installed)

 

Thanks

667 Posts

October 12th, 2008 11:00

How about diagnostics?  Can you boot and run those?

 

A quick test would be to download Memtest from here, burn it to a CD-Rom as an image, and run it.  It's a quick test of the system (minus disks) which might point to something.

3 Posts

October 15th, 2008 21:00

I ran memtest, and it spews tons of errors, stripped it down to the minimal required ram and it was still giving tons of errors.  I swapped dimms around and same thing--I don't think 16 1gb sticks all go bad at once...Does this point more to the mobo tray or memory risers?

667 Posts

October 16th, 2008 10:00

Not having seen the innards of the 6650, I'm not sure which is the culprit.  If the riser is passive (no electronics onboard) it's the motherboard.  If the riser has chips on it, it could be bad.  Most of this is also based on the system not being struck by a power surge (lightning).  If it was, everything could be fried.

 

I assume you've checked the power supplies.  No red lights that shouldn't be on, if you have a voltmeter, correct voltages, etc.  Also, you're testing this with all the extra cards out of the PCI slots, right? 

 

Try placing 2 1Gb chips in Bank 1 of each riser card and running the memory diagnostic.  Look for a pattern to the errors.  I'm not sure if the interleaving is 2 way or 4 way (probably 4 way) but look for every other address bad or every 4th address bad.  If it's every address bad, it's the motherboard.  Every 2nd or 4th could be the riser but it could also be the motherboard too.  Swap the riser cards and see if the errors change.  If they do, you might be able to just buy a couple of riser cards and see if that fixes the problem.

 

If I had to pick something, I'd pick the motherboard but it's a guess.  If you have another system, swap the memory riser cards and see if the error follows the riser cards.  I'm guessing you don't have maintenance so depending on how much time and money you have, you might want to start by buying the riser cards. If that doesn't fix it, then the motherboard.  About all you can do is swap parts until the problem goes away.

 

There are some 3rd party maintenance organizations who will maintain your hardware once it's out of warranty.  They might have spares for your server and would be able to diagnose the machine. 

Message Edited by jcn77056 on 10-16-2008 06:40 AM

0 events found

No Events found!

Top