got a release window.. did every last update I could find on dell's site... , ran the dell 2.1 diagnostics.. clean.. reseated ram.. and PERC card and the other Adaptec card that was in the box. (for the VS160 tape drive)... everything looked good..
and it did another unexpected reboot tonight.. Anyone have any other ideas ???? There are no other errors on the going down or comming back up, so to me it has to be a hardware issue... not a windows issue...
I'm basically at the point of having to call Dell support, although I'm not sure how we're going to diagnose this...
I'm also having the exact same problem with my PE1850? It seems to happen whenever i put sustained load on the raid disks. i.e. when I'm dumping databases to disks...
I also did full diagnostics..when i saw it do it the first time. I assumed it was a one off, but now it happens almost weekly.
I spoke with Dell Saturday night, their concern was that there was no UPS on the server. They said that, it might not fix the problem but they needed a baseline to go from. Sunday I shut the server down to move one Power Supply to UPS (wanted to be better safe than sorry). when I brought it back up it got to the starting windows2003 screen and rebooted... same in safemode. Called Dell and they said more likely to be a software issue, and to install a parallel copy of 2k3. While trying to do that the windows install cd said it had to do maintaince on Partition C: and reboot.. did that and now the server boots again.
Very odd as far as I'm concerned. I haven't installed the second copy in parallel.. I may just take it down next weekend and do a complete reinstall. I still don't believe that a "unexpected reboot" with no other errors anywhere in the eventlog could be a software issue...
As for Load.. this box isn't in full production yet and as such really only has a few user profiles on it. about 4 gig of user files.. XLS DOC's .. the usual. that and veritas Backup Exec.. is about all that's running on it, so load isn't an issue for me anyway... seems completely random. The odd thing is that it ran fine for a month.. then started this...
Is your hardware anything similar to what mine is... all of us can't be having the same odd ball problem.. there has to be a common point.... when did you guys (allof the responders with the same issue) get your servers. ? Mine was received mid Jan 05, and is everyone running 2003 Standard Server ?
We got ours late last year; it's a 2850; with win 2000. However we had a similiar problem once, much earlier in the year. Wiping the machines OS seem to fix the problem on that server. However thats rather drastic for a production server.
Two 1500 Xeon's
1Gig Ram
quad 36gb
Perc 4/DC Raid card (set up to mirror drives)
I just wiped the OS to see if that fixes the problem. Since there were not any windows even logs, I was thinking hardware, but there also was not any ESM logs, so I am not sure.
I'm in the same boat.. about to redo the os.... my feeling still is... what kind of app/os error doesn't create ANY event errors and reboots a machine... I personally haven't seen it. to me that's undoubtably a hardware issue.. I'm going to redo the os and continue to work with Dell to figure out what is wrong. As an asside.. I spoke with a UNISYS repair tech that I know, (UNISYS is a licensed dell repair contractor) and his feeling is the same.. It can't be os... has to be hardware, now the question is what... My feeling is is that it can really only be one of a few things.
1) Power Supply Controller/Manager (thing that decides which power supply to take juice from)
2) Motherboard
3) Raid Card.
Anything else shouldn't be completely error free to the os. As far as the ESM logs go. same here, nothing in them.
I have 11 PE 1800s, 10 of them from an image of the first.
3.2GHz/1MB Cache, Xeon, 800MH
2GB DDR2 400MHz
PERC4/SC
(5) 73GB,U320,SCSI 2 on RAID-1, 3 on RAID-5
W2K3 Server Standard Edition
I have one that is having the same problem randomly shutting down. Sometimes it'll be up 3 days sometimes 1. How can this be a software issue, unless something was corrupt when the image was being written to the drives by the RAID controller?
I still suspect it has something to do with the hardware. As with everyone else, nothing is being written to the event logs.
I reinstalled the os from scratch.... ... clean partition everthing... and it's been up for 18 days with not even a wimper...
I really don't get this..even the UNISYS tech's that I talked to who are dell service people said it pointed to hardware, which is what I told the office who owns the server (I support it, mainly remotely) initally... the only thing is ya.. bad install image from the start.. I haven't even tried going back to dell with this, since I can't prove it and it sounds really far fetched, they're not going to believe me no matter what I say at this point.. no prof... but it's getting down to the point where there aren't many more avenue's...
After the reload of the system mine has been up for a month and still no problems. Weird but I will take it. Would like to know the cause of the problem so I could prevent it in the future, but looks like nobody has figured it out yet.
WORMS that attack the server can cause the RPC to make the system reboot.
A good firewall and Antivirus are part of the issue.
This problem occurs because there is a limited amount of kernel space available for kernel drivers. When an RPC or other buffer overrun hits the server they tend to crash the kernel.
Windows 2000 kernel space
The limit is 12 KB for kernel drivers.
Windows 2000 running NTFS
Windows 2000 running NTFS examines the available kernel stack before processing an I/O request. If NTFS determines that there is insufficient stack space, then an exception error results. If there is not enough stack space for processing the exception, then a stack overflow occurs and the system double-faults and reboots. The kernel dies and the army gets wiped out.
LOL.
One way to verify problems is to Format and Re-install WITHOUT attaching to the real internet.
You hook to a personal firewall like a cheap linksys box and install the server and files.
When worms cant get in from the outside the system runs for MONTHS at a time with no reboots and yet non mysteriously crashes once the door is left open to the hacker kiddies by attaching
back to the internet. Merely having a firewall is NOT any kind of guarantee that you are
"safe". I tend to have 3 Levels of protection. Software Firewall, Antivirus, and Hardware firewall.
I can guarantee that was not my case. System has no internet connectivity, and up to date AV. Anything put on system is scanned by the machine I download it on, and the by another AV app on another machine. Then finally by the AV on the server I am putting it on. All three AV's are different.
barhampa
718 Posts
0
February 27th, 2005 07:00
Tuxman
11 Posts
0
February 27th, 2005 12:00
Event Type: Error
Event Source: EventLog
Event Category: None
Event ID: 6008
Date: 2/25/2005
Time: 8:28:57 PM
User: N/A
Computer: SERVER
Description:
The previous system shutdown at 8:26:14 PM on 2/25/2005 was unexpected.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: d5 07 02 00 05 00 19 00 Õ.......
0008: 14 00 1a 00 0e 00 de 02 ......Þ.
0010: d5 07 02 00 06 00 1a 00 Õ.......
0018: 01 00 1a 00 0e 00 de 02 ......Þ.
I'll try resetting the controller the next time I can get a release window.
Tuxman
11 Posts
0
March 12th, 2005 06:00
Ok... here's the latest
got a release window.. did every last update I could find on dell's site... , ran the dell 2.1 diagnostics.. clean.. reseated ram.. and PERC card and the other Adaptec card that was in the box. (for the VS160 tape drive)... everything looked good..
and it did another unexpected reboot tonight.. Anyone have any other ideas ???? There are no other errors on the going down or comming back up, so to me it has to be a hardware issue... not a windows issue...
I'm basically at the point of having to call Dell support, although I'm not sure how we're going to diagnose this...
Dave
furtle67
2 Posts
0
March 14th, 2005 08:00
Hi;
I'm also having the exact same problem with my PE1850? It seems to happen whenever i put sustained load on the raid disks. i.e. when I'm dumping databases to disks...
I also did full diagnostics..when i saw it do it the first time. I assumed it was a one off, but now it happens almost weekly.
Tuxman
11 Posts
0
March 14th, 2005 12:00
I spoke with Dell Saturday night, their concern was that there was no UPS on the server. They said that, it might not fix the problem but they needed a baseline to go from. Sunday I shut the server down to move one Power Supply to UPS (wanted to be better safe than sorry). when I brought it back up it got to the starting windows2003 screen and rebooted... same in safemode. Called Dell and they said more likely to be a software issue, and to install a parallel copy of 2k3. While trying to do that the windows install cd said it had to do maintaince on Partition C: and reboot.. did that and now the server boots again.
Very odd as far as I'm concerned. I haven't installed the second copy in parallel.. I may just take it down next weekend and do a complete reinstall. I still don't believe that a "unexpected reboot" with no other errors anywhere in the eventlog could be a software issue...
ausphiwvwc
6 Posts
0
March 16th, 2005 21:00
Tuxman
11 Posts
0
March 16th, 2005 22:00
Is your hardware anything similar to what mine is... all of us can't be having the same odd ball problem.. there has to be a common point.... when did you guys (allof the responders with the same issue) get your servers. ? Mine was received mid Jan 05, and is everyone running 2003 Standard Server ?
Dave
furtle67
2 Posts
0
March 17th, 2005 07:00
Hi;
We got ours late last year; it's a 2850; with win 2000. However we had a similiar problem once, much earlier in the year. Wiping the machines OS seem to fix the problem on that server. However thats rather drastic for a production server.
ausphiwvwc
6 Posts
0
March 18th, 2005 02:00
1Gig Ram
quad 36gb
Perc 4/DC Raid card (set up to mirror drives)
I just wiped the OS to see if that fixes the problem. Since there were not any windows even logs, I was thinking hardware, but there also was not any ESM logs, so I am not sure.
Tuxman
11 Posts
0
March 18th, 2005 08:00
MJF01
3 Posts
0
April 20th, 2005 12:00
Tuxman
11 Posts
0
April 21st, 2005 01:00
ausphiwvwc
6 Posts
0
April 21st, 2005 06:00
speedstep
9 Legend
•
47K Posts
0
April 21st, 2005 12:00
A good firewall and Antivirus are part of the issue.
This problem occurs because there is a limited amount of kernel space available for kernel drivers. When an RPC or other buffer overrun hits the server they tend to crash the kernel.
Windows 2000 kernel space
The limit is 12 KB for kernel drivers.
Windows 2000 running NTFS
Windows 2000 running NTFS examines the available kernel stack before processing an I/O request. If NTFS determines that there is insufficient stack space, then an exception error results. If there is not enough stack space for processing the exception, then a stack overflow occurs and the system double-faults and reboots. The kernel dies and the army gets wiped out.
LOL.
One way to verify problems is to Format and Re-install WITHOUT attaching to the real internet.
You hook to a personal firewall like a cheap linksys box and install the server and files.
When worms cant get in from the outside the system runs for MONTHS at a time with no reboots and yet non mysteriously crashes once the door is left open to the hacker kiddies by attaching
back to the internet. Merely having a firewall is NOT any kind of guarantee that you are
"safe". I tend to have 3 Levels of protection. Software Firewall, Antivirus, and Hardware firewall.
ausphiwvwc
6 Posts
0
April 21st, 2005 14:00