Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1513458

August 26th, 2013 06:00

The watchdog timer expired.

We have several Dell PowerEdge T620 servers in remote locations throughout our Enterprise.  Each of them is randomly throwing the following event and thus far I've found no information about the message or how to resolve it.  I'm hoping someone here can help me figure this out.


Event Message: The watchdog time expired.

Severity: Critical

Detailed Description: The operating system or potentially an application failed to communicate to the baseboard management controller (BMC) within the timeout period.

Recommended Action: Check the operating system, application, hardware, and system event log for exception events.

Message ID: ASR0000

System Model: PowerEdge T620

Power State: ON

Operating System: Microsoft Windows Server 2012, Standard x64 Edition

While I've been working in the desktop support world for a very long time, I'm fairly new to Dell servers.  I'm trying to help another highly over-worked, over-stressed, administrator with this issue.  Is someone can spare the time to help me learn where to look to gain more insight on what might be going on, I'd really appreciate it.  I'm willing and able to learn so I can help take some work off a co-worker's plate.  Thanks.

990 Posts

August 28th, 2013 14:00

I apologize for the delay in responding.  After reviewing the errors and doing some research, the error is coming from Dell's OpenManage software v7.2 . Our recommendation is to update your OpenManage to version 7.3  and monitor.  This version should address the timeout error in this particular service is giving the watchdog error.

Regards,

 

990 Posts

August 26th, 2013 07:00

The watchdog timer is used to monitor the status of a component. It operates by monitoring responses. When it stops getting a heartbeat from a component that it is monitoring then the timer expires, and you receive an error in the log. When the timer expires it will initiate whatever action is set. If the operating system stops responding then the timer will expire and restart the server if it is set to perform that action.

The above error doesn't tell us why the timer expired, so you will need to review your hardware and operating system logs to find out what happened when the timer expired.

Regards,

8 Posts

August 26th, 2013 10:00

Just prior to the error, the following events occured:

8/23/2013 11:10:06 PM

Faulting application name: dsm_sa_datamgr64.exe, version: 7.2.0.3801, time stamp: 0x50c769ae
Faulting module name: dciemp64.dll, version: 7.2.0.3999, time stamp: 0x50c77d73
Exception code: 0xc0000005
Fault offset: 0x0000000000004038
Faulting process id: 0x8e0
Faulting application start time: 0x01ce9f07bdc36762
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\omsa\bin\dciemp64.dll
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Faulting package full name:
Faulting package-relative application ID:

Followed by...

08/23/2013 11:10:06 PM

Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: dsm_sa_datamgr64.exe
P2: 7.2.0.3801
P3: 50c769ae
P4: dciemp64.dll
P5: 7.2.0.3999
P6: 50c77d73
P7: c0000005
P8: 0000000000004038
P9:
P10:

Attached files:
C:\Windows\Temp\WER1075.tmp.appcompat.txt
C:\Windows\Temp\WER10D4.tmp.WERInternalMetadata.xml
C:\Windows\Temp\WER10D5.tmp.hdmp
C:\Windows\Temp\WER13C2.tmp.dmp

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_dsm_sa_datamgr64_6033d1f5754645d6f47ce76327e3cf9364ed73_cab_094e147b

Analysis symbol:
Rechecking for solution: 0
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Report Status: 96
Hashed bucket:


And finally...

08/23/2013 11:10:08 PM

Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: dsm_sa_datamgr64.exe
P2: 7.2.0.3801
P3: 50c769ae
P4: dciemp64.dll
P5: 7.2.0.3999
P6: 50c77d73
P7: c0000005
P8: 0000000000004038
P9:
P10:

Attached files:
C:\Windows\Temp\WER1075.tmp.appcompat.txt
C:\Windows\Temp\WER10D4.tmp.WERInternalMetadata.xml
C:\Windows\Temp\WER10D5.tmp.hdmp
C:\Windows\Temp\WER13C2.tmp.dmp

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_dsm_sa_datamgr64_6033d1f5754645d6f47ce76327e3cf9364ed73_cab_094e147b

Analysis symbol:
Rechecking for solution: 0
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Report Status: 4
Hashed bucket:

Does that help at all?  If not, where specifically should I be looking for logs?  I've checked the iDRAC7 and it had less data then the original message.  The above mentioned three events were located in the Windows Event Viewer.

8 Posts

August 29th, 2013 06:00

Dell-Geoff P,


I was at one of our facilities yesterday so I went ahead and ran the latest SUU upon the server and got it caught up on all firmware and driver updates.  That did include the OpenManage Server Administrator upgrade to 7.3.0.  We'll monitor the server over the next few days and I'll report back with my findings.


Thank you,

Geoff

8 Posts

September 3rd, 2013 06:00

So far I've not seen this message return on the 1 server upgraded.  I will be upgrading a second of seven servers tomorrow.  I'll update you afterward.  Thank you for your patience while we work to get fully updated.  It should go faster after tomorrow's work.

8 Posts

September 18th, 2013 06:00

It appears that upgrading to Dell OpenManage 7.3 has resolved this issue.  Thanks for your help!

1 Message

October 14th, 2013 16:00

Actually, I have the same error, but mine is brand new server loaded with OM7.3.

Any though?

-------------------------------

System Host Name: JCMS8BDC01
Event Message: The watchdog timer expired.
Date/Time: Mon Oct 14 2013 16:37:08
Severity: Critical

Detailed Description: The operating system or potentially an application failed to communicate to the baseboard management controller (BMC) within the timeout period.
Recommended Action: Check the operating system, application, hardware, and system event log for exception events. 
Message ID: ASR0000

------------------------
Windows log reads
------------------------

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x5c0
Faulting application start time: 0x01cec924561cbd3a
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe


October 14th, 2013 23:00

I've had the same error on 4 Windows 2008 R2 PE blades with OM 7.3 after installing this month's Microsoft patches which included numerous .NET.  After the reboot the DSM SA Data Manager service does not start.  Manually starting the service works.  A second reboot the service starts on its own.

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x780
Faulting application start time: 0x01cec9552d6c3a8b
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Report Id: 9040c032-3548-11e3-8a30-e0db55230842

October 15th, 2013 15:00

Just installed the MS updates on R620 Win 2008 R2 with OM 7.3 and had an unexpected ASR Watchdog reboot

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x584
Faulting application start time: 0x01cec9d288eeaa00
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Report Id: e3ac4279-35c5-11e3-9bd5-b8ca3af5c99a

7.3 is definitely not the solution here. Anyone any ideas?

 

January 18th, 2014 05:00

Encountering same error from IDRAC on server R710 running VMware ESX :

Event: The watchdog timer expired.
Date/Time: Sat Jan 18 2014 10:24:31
Severity: Critical
Model: PowerEdge R710
Service Tag: F931Z4J
BIOS version: 6.3.0
Hostname: left blank intentionally 
OS Name: VMware ESXi 5.1.0 build-1065491.0 build-106549
iDrac version: 1.85

January 21st, 2014 01:00

problem was solved by upgrading IDRAC firmware from 1.85 to 1.96 directly.

February 4th, 2014 01:00

IDRAC same critical alert  re appears  2 weeks later with  iDrac being upgraded to latest version: 1.96 

Message: 
Event: The watchdog timer expired.
Date/Time: Tue Feb 04 2014 01:45:48
Severity: Critical
Model: PowerEdge R710
Service Tag: H4SY75J
BIOS version: 6.3.0
Hostname: 
OS Name: VMware ESXi 5.1.0 build-1065491.0 build-106549
iDrac version: 1.96

743 Posts

February 4th, 2014 13:00

We are getting a LOT of these alerts now.....every time we reboot a server after a WSUS update (ie monthly). All the boxes are on 7.3.

Is there a way to disable this function/alert??

Thx,

John Bradshaw

5 Posts

October 15th, 2014 12:00

I have the same problem. After the ASR the 2nd restart was stable.

Now there is a flashing red light on the front of the server. How do I clear the flashing light?

ASR is asserted a few minutes after Windows restart although Windows is running and appears to be functioning normally. This was observed after installing Microsoft Updates (FWIW which usually includes long-running high-CPU .Net compiles and high-CPU utilization by the TrustedInstaller service). Since we don't restart our servers "for no reason" and the "reason" is usually Microsoft updates, it is hard to know whether this is a cause-effect relationship.

Dell PowerEdge M600

iDRAC Firmware 1.60 (Build 3)

CPLD Version 1.0.1 
BIOS Version 2.4.0

Dell OpenManage Server Administrator 6.5.0

Windows Server 2008 R2

 

OOPS! I see this forum question is already marked as answered. I will re-post in a new thread since I have a slightly different question. My new thread is here: http://en.community.dell.com/support-forums/servers/f/956/p/19603359/20686914#20686914

2 Posts

January 28th, 2015 14:00

Good day friend

Might you please support me explaining such as the upgrade

I hope I can support greetings
No Events found!

Top