1 Rookie

 • 

12 Posts

37

October 5th, 2025 12:16

error watchdog timeout

Poweredge R640 , Vmware Esxi version 7. 

When I upgrade the memory DIMM capacity from 256 (2x128GB0 to 512(4x128GB) I get a watch dog timeout error and the Esxi OS freezes.

Can any one out there help?

Moderator

 • 

3.5K Posts

October 6th, 2025 06:47

Hi, 

Here are the steps to troubleshoot and resolve this issue:

1. Verify Memory and Seating

The first and most common cause after a hardware change is a physical issue with the new components or their installation.

Check iDRAC/BIOS: Look for Memory Initialization Errors, Memory Population Errors, or Memory Health status in your iDRAC or the system BIOS. This is the fastest way to confirm if the server detected an issue with the new DIMMs or their arrangement.

Confirm DIMM Compatibility: Ensure the new 128GB DIMMs are the exact specifications (type, rank, speed, voltage) supported by the R640 and compatible with your existing memory. Given the capacity, these are likely Load Reduced DIMMs (LRDIMM) or Intel Optane Persistent Memory (PMem). Mixing memory types or using incompatible modules can cause instability.

Verify Installation Order: Server memory installation is highly specific. Re-check the R640 manual or the diagram on the server's lid to ensure the four 128GB DIMMs are installed in the correct, sequential slots for a two-CPU, four-DIMM per CPU (or two-DIMM per CPU depending on the final configuration) setup. Incorrect seating or slot order can cause a memory population error that leads to instability.

2. Update System Firmware

Outdated firmware often can't properly manage or recognize new, larger capacity memory configurations, leading to system instability and watchdog timeouts.

Update BIOS and iDRAC: Ensure your server's BIOS and iDRAC firmware are updated to the latest available versions from the Dell support website. Dell frequently releases updates to improve memory compatibility and stability. Use the iDRAC interface for the update if possible.

Update the HBA/RAID Controller: While less likely, ensure any Host Bus Adapters (HBAs) or RAID controllers also have the latest firmware, as they are part of the boot and initialization process.

3. Check/Adjust BIOS Memory Settings

The way the BIOS trains and initializes memory might be contributing to the timeout.

Memory Operating Mode: Check the memory settings in the BIOS (usually under System Setup \rightarrow System BIOS \rightarrow Memory Settings). If you are using PMem (persistent memory), ensure the memory operating mode is set correctly for your desired configuration (e.g., App Direct or Memory Mode).

Memory Training: Temporarily set the Memory Training option in the BIOS (often called Memory Fast Boot) to a more thorough setting (e.g., Retrain at Every Boot or Disabled for one boot) to force the system to fully re-initialize and validate the new memory configuration. After a successful boot, you can usually set it back to the fast setting.

Disable Watchdog (Temporary): The R640 BIOS includes an OS Watchdog Timer option. As a temporary diagnostic step, you can try disabling this option in the BIOS to see if the system freezes without the hard reboot, which may give you a better error message, though this is not a permanent solution.

4. ESXi-Specific Troubleshooting

Validate ESXi Build: Ensure your specific build of ESXi 7.x is supported by Dell for the R640. Ideally, you should be running the Dell Custom Image of ESXi.

Review Logs: After the crash, check the server's log files via iDRAC or the BIOS's Lifecycle Log. Look for errors in the Hardware Log or messages related to memory, CPU, or a system halt before the watchdog reset

No Events found!

Top