Highlighted
IconZ1
1 Copper

The watchdog timer expired.

Jump to solution

We have several Dell PowerEdge T620 servers in remote locations throughout our Enterprise.  Each of them is randomly throwing the following event and thus far I've found no information about the message or how to resolve it.  I'm hoping someone here can help me figure this out.


Event Message: The watchdog time expired.

Severity: Critical

Detailed Description: The operating system or potentially an application failed to communicate to the baseboard management controller (BMC) within the timeout period.

Recommended Action: Check the operating system, application, hardware, and system event log for exception events.

Message ID: ASR0000

System Model: PowerEdge T620

Power State: ON

Operating System: Microsoft Windows Server 2012, Standard x64 Edition

While I've been working in the desktop support world for a very long time, I'm fairly new to Dell servers.  I'm trying to help another highly over-worked, over-stressed, administrator with this issue.  Is someone can spare the time to help me learn where to look to gain more insight on what might be going on, I'd really appreciate it.  I'm willing and able to learn so I can help take some work off a co-worker's plate.  Thanks.

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
Moderator
Moderator

RE: The watchdog timer expired.

Jump to solution

I apologize for the delay in responding.  After reviewing the errors and doing some research, the error is coming from Dell's OpenManage software v7.2 . Our recommendation is to update your OpenManage to version 7.3  and monitor.  This version should address the timeout error in this particular service is giving the watchdog error.

Regards,

 

Geoff P
Dell | Social Outreach Services - Enterprise


Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device!
(iOS, Android, Windows)

0 Kudos
18 Replies
Moderator
Moderator

RE: The watchdog timer expired.

Jump to solution

The watchdog timer is used to monitor the status of a component. It operates by monitoring responses. When it stops getting a heartbeat from a component that it is monitoring then the timer expires, and you receive an error in the log. When the timer expires it will initiate whatever action is set. If the operating system stops responding then the timer will expire and restart the server if it is set to perform that action.

The above error doesn't tell us why the timer expired, so you will need to review your hardware and operating system logs to find out what happened when the timer expired.

Regards,

Geoff P
Dell | Social Outreach Services - Enterprise


Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device!
(iOS, Android, Windows)

0 Kudos
IconZ1
1 Copper

RE: The watchdog timer expired.

Jump to solution

Just prior to the error, the following events occured:

8/23/2013 11:10:06 PM

Faulting application name: dsm_sa_datamgr64.exe, version: 7.2.0.3801, time stamp: 0x50c769ae
Faulting module name: dciemp64.dll, version: 7.2.0.3999, time stamp: 0x50c77d73
Exception code: 0xc0000005
Fault offset: 0x0000000000004038
Faulting process id: 0x8e0
Faulting application start time: 0x01ce9f07bdc36762
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\omsa\bin\dciemp64.dll
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Faulting package full name:
Faulting package-relative application ID:

Followed by...

08/23/2013 11:10:06 PM

Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: dsm_sa_datamgr64.exe
P2: 7.2.0.3801
P3: 50c769ae
P4: dciemp64.dll
P5: 7.2.0.3999
P6: 50c77d73
P7: c0000005
P8: 0000000000004038
P9:
P10:

Attached files:
C:\Windows\Temp\WER1075.tmp.appcompat.txt
C:\Windows\Temp\WER10D4.tmp.WERInternalMetadata.xml
C:\Windows\Temp\WER10D5.tmp.hdmp
C:\Windows\Temp\WER13C2.tmp.dmp

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_dsm_sa_datamgr64_6033d1f5754645d6f47ce76327e3cf9364ed73_cab_094e147b

Analysis symbol:
Rechecking for solution: 0
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Report Status: 96
Hashed bucket:


And finally...

08/23/2013 11:10:08 PM

Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: dsm_sa_datamgr64.exe
P2: 7.2.0.3801
P3: 50c769ae
P4: dciemp64.dll
P5: 7.2.0.3999
P6: 50c77d73
P7: c0000005
P8: 0000000000004038
P9:
P10:

Attached files:
C:\Windows\Temp\WER1075.tmp.appcompat.txt
C:\Windows\Temp\WER10D4.tmp.WERInternalMetadata.xml
C:\Windows\Temp\WER10D5.tmp.hdmp
C:\Windows\Temp\WER13C2.tmp.dmp

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_dsm_sa_datamgr64_6033d1f5754645d6f47ce76327e3cf9364ed73_cab_094e147b

Analysis symbol:
Rechecking for solution: 0
Report Id: acddebe1-0c6a-11e3-93f9-001018f63d67
Report Status: 4
Hashed bucket:

Does that help at all?  If not, where specifically should I be looking for logs?  I've checked the iDRAC7 and it had less data then the original message.  The above mentioned three events were located in the Windows Event Viewer.

0 Kudos
Moderator
Moderator

RE: The watchdog timer expired.

Jump to solution

I apologize for the delay in responding.  After reviewing the errors and doing some research, the error is coming from Dell's OpenManage software v7.2 . Our recommendation is to update your OpenManage to version 7.3  and monitor.  This version should address the timeout error in this particular service is giving the watchdog error.

Regards,

 

Geoff P
Dell | Social Outreach Services - Enterprise


Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device!
(iOS, Android, Windows)

0 Kudos
IconZ1
1 Copper

RE: The watchdog timer expired.

Jump to solution

Dell-Geoff P,


I was at one of our facilities yesterday so I went ahead and ran the latest SUU upon the server and got it caught up on all firmware and driver updates.  That did include the OpenManage Server Administrator upgrade to 7.3.0.  We'll monitor the server over the next few days and I'll report back with my findings.


Thank you,

Geoff

0 Kudos
IconZ1
1 Copper

RE: The watchdog timer expired.

Jump to solution

So far I've not seen this message return on the 1 server upgraded.  I will be upgrading a second of seven servers tomorrow.  I'll update you afterward.  Thank you for your patience while we work to get fully updated.  It should go faster after tomorrow's work.

0 Kudos
IconZ1
1 Copper

RE: The watchdog timer expired.

Jump to solution

It appears that upgrading to Dell OpenManage 7.3 has resolved this issue.  Thanks for your help!

hsakamoto
1 Copper

RE: The watchdog timer expired.

Jump to solution

Actually, I have the same error, but mine is brand new server loaded with OM7.3.

Any though?

-------------------------------

System Host Name: JCMS8BDC01
Event Message: The watchdog timer expired.
Date/Time: Mon Oct 14 2013 16:37:08
Severity: Critical

Detailed Description: The operating system or potentially an application failed to communicate to the baseboard management controller (BMC) within the timeout period.
Recommended Action: Check the operating system, application, hardware, and system event log for exception events. 
Message ID: ASR0000

------------------------
Windows log reads
------------------------

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x5c0
Faulting application start time: 0x01cec924561cbd3a
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe


0 Kudos
MAG39Marine
1 Copper

RE: The watchdog timer expired.

Jump to solution

I've had the same error on 4 Windows 2008 R2 PE blades with OM 7.3 after installing this month's Microsoft patches which included numerous .NET.  After the reboot the DSM SA Data Manager service does not start.  Manually starting the service works.  A second reboot the service starts on its own.

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x780
Faulting application start time: 0x01cec9552d6c3a8b
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Report Id: 9040c032-3548-11e3-8a30-e0db55230842

0 Kudos
GraphiteDesign
1 Copper

RE: The watchdog timer expired.

Jump to solution

Just installed the MS updates on R620 Win 2008 R2 with OM 7.3 and had an unexpected ASR Watchdog reboot

Faulting application name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Faulting module name: dsm_sa_datamgr64.exe, version: 7.3.0.350, time stamp: 0x51b23742
Exception code: 0xc0000005
Fault offset: 0x0000000000014c77
Faulting process id: 0x584
Faulting application start time: 0x01cec9d288eeaa00
Faulting application path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Faulting module path: C:\Program Files\Dell\SysMgt\dataeng\bin\dsm_sa_datamgr64.exe
Report Id: e3ac4279-35c5-11e3-9bd5-b8ca3af5c99a

7.3 is definitely not the solution here. Anyone any ideas?

 

0 Kudos