2 Bronze

C6100 server BSOD

Jump to solution

I'm trying to track down why this server is experiencing blue screens.

I see the following errors in the event log:

Error Event ID 6008 "The previous system shutdown at 4:21:28 AM on 6/9/2015 was unexpected."

Critical Event ID 41 "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

Error Event ID 1001 "The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000124 (0x0000000000000000, 0xfffffa800a62e038, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\Minidump\060915-24913-01.dmp. Report Id: 060915-24913-01."

Error Event ID 1500 "The SNMP Service encountered an error while accessing the registry key SYSTEM\CurrentControlSet\Services\SNMP\Parameters\TrapConfiguration."

Error Event ID 7026

"The following boot-start or system-start driver(s) failed to load: 

ClusDisk"

Error Event ID 18

"A fatal hardware error has occurred.

Reported by component: Processor Core

Error Source: Machine Check Exception

Error Type: Internal Timer Error

Processor ID: 0

The details view of this entry contains further information."

If it helps, here's a couple of links to the dump files generated after the BSOD:

Here is a link to the dump file: https://onedrive.live.com/redir?resid=A4C0CA009AA14F65!109&authkey=!AImteEv0JAXxu_k&ithint=f...

Here's another dump file from when the BSOD occurred again 5 days later: https://onedrive.live.com/redir?resid=A4C0CA009AA14F65!110&authkey=!AP1c-iA94vV-i0k&ithint=f...

I've checked the web based "megarac" software to look for any error logs or any information on issues with voltages, temperatures or fan speeds and everything appears to be within normal levels, no BMC errors.

MegaRAID storage manager shows an optimal array.

I don't currently have a good monitoring solution for RAM, PSUs or CPUs. Still looking for one.

Can anyone here help point me in the right direction to figure out what's going on?

Reply
1 Solution

Accepted Solutions
Highlighted
Moderator
Moderator

RE: C6100 server BSOD

Jump to solution

Hello

CPU machine checks are errors detected by the CPU. They are generally caused by drivers. They can also be caused by memory, system board, or the CPU/memory controller.

The operating system is stating that this is a hardware issue because the processor halted the system when it encountered the non-maskable interrupt. The CPU was not able to recover from the error so it halted the system creating a blue screen. The dump files just point to the cause of the halt which is the processor.

I would suggest checking the hardware log within the BMC for any memory errors that occurred at the time of the blue screen. If the same memory is producing errors when the blue screens occur then swap the memory around to see if the errors follow the memory, stay with the slot, or stay with the CPU/memory controller. If you installed a driver for virtual or physical hardware around the time this issue started then that driver is likely the cause. I would update the driver if possible or uninstall it to see if the issue stops.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

View solution in original post

Reply
4 Replies
Highlighted
Moderator
Moderator

RE: C6100 server BSOD

Jump to solution

Hello

CPU machine checks are errors detected by the CPU. They are generally caused by drivers. They can also be caused by memory, system board, or the CPU/memory controller.

The operating system is stating that this is a hardware issue because the processor halted the system when it encountered the non-maskable interrupt. The CPU was not able to recover from the error so it halted the system creating a blue screen. The dump files just point to the cause of the halt which is the processor.

I would suggest checking the hardware log within the BMC for any memory errors that occurred at the time of the blue screen. If the same memory is producing errors when the blue screens occur then swap the memory around to see if the errors follow the memory, stay with the slot, or stay with the CPU/memory controller. If you installed a driver for virtual or physical hardware around the time this issue started then that driver is likely the cause. I would update the driver if possible or uninstall it to see if the issue stops.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

View solution in original post

Reply
Highlighted
2 Bronze

RE: C6100 server BSOD

Jump to solution

Thank you, that's the most information I've gotten out of anyone so far about this issue.

Unfortunately the BMC has not logged any error messages when these BSODs have occurred.

The last time I installed a driver was when I installed the OS in December. I installed the MegaRAID Storage Manager in April, and MS .NET Framework 4.5.2 in mid May. The first BSOD was on June 1st.

The server is part of an NLB cluster. The other member of the cluster has not had this issue.

Reply
Highlighted
1 Copper

RE: C6100 server BSOD

Jump to solution

I have spent 3+ weeks researching this on the internet from various websites, a lot claiming to be computer experts.

You are the first one to state that the memory can be one of the causes of the BSOD.

I ran a DELL Diagnostic Test this morning and part of the memory failed.

It was suggested that I replace the memory modules and increase it for the Windows 10 installation.

My PC Windows Update log indicates that several attempts to install Windows 10 on my PC and they all failed.

I order replacement modules this afternoon and should have them by the 26th. I’m anxious to see if the Windows 10 will install after I replace the modules and if the BSOD stops.

Reply
Highlighted
2 Bronze

RE: C6100 server BSOD

Jump to solution

Blue screens of death can be caused by a multitude of factors. There are many tools on the internet that can analyze these; however, Microsoft has its own tool. When a computer is exhibiting problems, most users are reluctant to download a 3rd party tool that "might make things worse." This is where the Windows Debugging Tools come into play.

http://www.deskdecode.com/how-to-fix-bsod-blue-screen-of-death/

Reply