Start a Conversation

Unsolved

This post is more than 5 years old

V

15842

September 25th, 2003 00:00

PowerEdge 1600SC instability RH8 - mtrr: type mismatch

Hi All,

This is a tough one. I've pulled my hair out and Dell tech's have almost given up.

I've got a PowerEdge 1600SC running Red Hat 8. The machine was running smoothly for almost 6 months with NO problems until the other day when i could not connect to it using XDMCP (Remote X Desktop).

Whilst trying to solve the X problem I encountered the following problems:

  1. Main partition experienced 100% corruption, totally bit the dust. Whilst scanning for bad blocks on another disk.
  2. Machine locked up on reboot on the detecting hard disk SCSI part.
  3. Machine stopped detecting SCSI drives as 320mb / Ultra wide.
  4. Memtest86 reports multiple memory failures and locks up requiring hard reset.
  5. GDM / KDE lock up and stop responding when logging in or logging out.
  6. Sometimes cannot switch b/t graphical and text consoles. X goes all funky and displays garbage on screen.
  7. samba sometimes stalls when copying files to windows machine, copying eventually continues. 

Dell diagnostics still tell me my hardware is 'optimal'.

Everytime I experience problems 5, 6 or 7 the following line is addded to the kernel log:

mtrr: type mismatch for fd000000,800000 old: uncachable new: write-combining

Dell have replaced the Processor 2.4Ghz XEON, Motherboard, and one stick of the dell supplied memory.

Before the processor was replaced the machine would not even boot, failed with something along the lines of 'cannot detect controller or drives or controller failure' (can't remember the exact wording).

The mtrr error seems to be quite common for the PowerEdge series of servers:

The problems I am experiencing point to an I/O problem, possibly caused by bad power (bits getting flipped) or perhaps there is a design fault or underlying weakness in the 1600/2600 motherboard.

Any ideas? The only things left to replace are the SCSI drives, Network Card, PSU and case.

Thanks in advance...

7 Posts

December 30th, 2003 08:00

Hi,

Nowadays I have similar problems with my new 1600SC (with Debian and also with Mandrake):

mtrr: type mismatch for fd000000,800000 old: uncachable new: write-combining

I have check my hardware with some Dell Diagnostic Tools and with memtest86 tool several hours without find anything.

I have the tipical "signal 11" problem compiling the kernel:
http://www.bitwizard.nl/sig11/
sometimes "signal 8" and sometimes SCSI problems accesing the tape unit.

Yesterday I found that the last untar of the kernel source was corrupted, with ^@^@^@^@^@ in some files of the kernel source but now (after reboot) the same source files are ok.

Has Poweredge 1600SC a design problem? Any idea?

TIA,

Vicente

13 Posts

January 2nd, 2004 01:00

As it turned out with this machine, the instability / data corruption problem was a caused by a faulty mobo/mem/cpu.

However the (seemingly unrelated) mtrr problem persists. And occurs whenever X is started. I believe it may be due to either the (apparently) strange system memory areas alocated by mobo/bios for system functions or an incompatibility / design problem with the mobo. I remember reading some posts earlier that pointed to a problem in earlier kernels with certain PowerEdge mobos.

However your problem sounds like bad hardware. Even with the mtrr issue, I still am able to get excellent uptime.

 

No Events found!

Top