Unsolved
This post is more than 5 years old
4 Posts
0
12244
December 30th, 2004 19:00
PE1750 consistent system lock ups
I'm hoping that someone here will be able to help me with my troubles. Earlier this month, I bought a PowerEdge 1750 and have in a colocated environment. Ever since I dropped it off at the data center, I've been having a whole mess of system lock ups.
First, a little about the system. It is a basic PE1750 with dual 2.4ghz Xeons, redundant PSU, and ERA/O. The system came with 1gb RAM and one 36gb 10krpm drive. I swapped out the ram and hard drives for 2gb SimpleTech DDR266 Reg ECC ram I bought new and 3x 36gb 15krpm drives. The hard drives turned out being refurbished (which I didn't know when I ordered them, they didn't state it anywhere). I complained to the company, but ended up keeping them. Would have bought the components from Dell, but face it, Dell isn't the cheapest for memory and HD upgrades.
The system is running Windows Small Business Server 2003 Premium edition with Exchange, IIS, and SQL Server running. Aside from those, there isn't really anything else running (besides OpenManage and the Dell utils). I've run this same software config before on other servers, so I am confident the software/OS is kosher.
When the system locks up, it basically just completely stops responding. I can't get direct console access since it is at the data center, and remote management card won't show anything from the console or error screen. All I can really do is have it reset the system (which takes a couple requests sometimes). There is almost nothing in the event log... it is like it just decides it going to stop everything.
I have managed to locate a couple of errors that might help. First, in the ERA/O, it shows a couple of errors in the hardware log as follows, all at the same time:
- System software event - CPU Bus Parity Error detected.
- System software event - CPU Internal Error detected.
- System software event - CPU Internal Error detected.
I've also got a couple of errors in the system event log that occur sometimes just before it stops responding:
(shows this one twice)
Event Type: Error
Event Source: WMIxWDM
Event Category: None
Event ID: 107
Date: 12/30/2004
Time: 8:43:06 AM
User: N/A
Computer: CHEF
Description:
Machine Check Event reported is a fatal error.
Event Source: WMIxWDM
Event Category: None
Event ID: 107
Date: 12/30/2004
Time: 8:43:06 AM
User: N/A
Computer: CHEF
Description:
Machine Check Event reported is a fatal error.
Event Type: Error
Event Source: symmpi
Event Category: None
Event ID: 15
Date: 12/30/2004
Time: 8:41:06 AM
User: N/A
Computer: CHEF
Description:
The device, \Device\Scsi\symmpi1, is not ready for access yet.
Event Source: symmpi
Event Category: None
Event ID: 15
Date: 12/30/2004
Time: 8:41:06 AM
User: N/A
Computer: CHEF
Description:
The device, \Device\Scsi\symmpi1, is not ready for access yet.
Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 11
Date: 12/30/2004
Time: 8:41:06 AM
User: N/A
Computer: CHEF
Description:
The driver detected a controller error on \Device\Harddisk0.
Event Source: Disk
Event Category: None
Event ID: 11
Date: 12/30/2004
Time: 8:41:06 AM
User: N/A
Computer: CHEF
Description:
The driver detected a controller error on \Device\Harddisk0.
It only has extended info on the last disk error. It says "This problem is typically caused by a failing cable that connects the drive to the computer" and to replace the cable. There is no cable though, so it isn't that easy. Maybe the SCSI backpanel?
I have run all of the diagnostic tests in OpenManage and they all have passed. I had the system for about a week before taking it to the data center and didn't have any lock ups, but I hadn't had it running for more than a couple of hours to install/configure the system and run all the diagnostics before dropping it off.
From this info, I'm guessing the culprit is one of the following:
- Bad CPU
- Bad onboard SCSI controller or backpanel
- Bad hard drive
Any ideas would be GREATLY appreciated. I'd really like to get this fixed, since I'm not yet comfortable switching entirely over to using it until it is stable.
Thanks,
Ken Robertson
0 events found
No Events found!


barhampa
718 Posts
0
January 1st, 2005 06:00
Ken_Robertson
4 Posts
0
January 3rd, 2005 03:00
It doesn't have a RAID controller, it just has the onboard SCSI controller.
To see if it was a hacker or anything else, I completely closed the firewall and blocked all traffic to it, and it still locked up. Now I cannot even restart the system. It locked up sometime Friday night, and it won't respond to any of the reset/power cycle/power off commands.
Ken_Robertson
4 Posts
0
October 3rd, 2006 14:00
As for what the problem was, it was a fried CPU. The server came from Dell with hardly any thermal grease on one of the CPUs, causing it to consistently overheat and cause lock ups, eventually leading up to frying itself. Dell came out, replaced the CPU, and been working fine ever since.
Ken
DPYeilding
80 Posts
0
October 3rd, 2006 14:00
DPYeilding
80 Posts
0
October 3rd, 2006 14:00
DPYeilding
80 Posts
0
October 3rd, 2006 15:00