PowerFlex: Watchdog Timer Expired on ESXi Nodes
Summary: Lifecycle in iDRAC is showing 'watchdog timer expired' alerts on nodes running ESXi.
Symptoms
Lifecycle shows watchdog timer expired alerts on ESXi nodes.
Cause
The iSM service not properly running on ESXi can cause this issue. iSM service communicates with OS with iDRAC, if there is an issue with it, then it is expected to trigger the alert.
When checking dcism or dellism, you notice one of these statuses: active not running, active running with limited functionality, or inactive dead
vSphere 7.x:
/etc/init.d/dcism-netmon-watchdog status
vSphere 8.x:
/etc/init.d/dellism status
Resolution
- Connect to ESXi using SSH to the affected nodes.
- Check the iSM status:
For vSphere 7.x:
/etc/init.d/dcism-netmon-watchdog status
For vSphere 8.x:
/etc/init.d/dellism status
- If the iSM is not running, then restart the iSM service and check the status after that:
For vSphere 7.x:
/etc/init.d/sfcbd-watchdog stop /etc/init.d/sfcbd-watchdog start /etc/init.d/dcism-netmon-watchdog restart /etc/init.d/dcism-netmon-watchdog status
For vSphere 8.x:
/etc/init.d/dellism stop /etc/init.d/dellism start /etc/init.d/dellism status
iSM should be up and running properly. If iSM is still not running, then:
- Reboot iDRAC
- Wait until iDRAC is fully started (it should take no more than 5 minutes)
- Restart iSM service again using
esxclicommands
If it is still not running, powerdrain the node and check iSM status.
Once iSM shows running, clear the SEL on iDRAC, it should now show as healthy and watchdog timer expired alerts should not be triggered.
PowerEdge: How to View or Clear the System event log