PowerEdge: XE9680 - GPUs Disappear From Host

Summary: One or more NVIDIA HGX GPUs may disappear from XE9680 hosts with Xid and Bus Fatal Error messages.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Typical events and alerts that indicate this issue include:

  • Host kernel logs showing Xid 74 and or 79 from the NVIDIA driver. On Linux, Xid error messages are in /var/log/messages.
Use the command grep "NVRM: Xid" to locate all Xid messages. Example of an Xid string:
[...] NVRM: Xid (PCI:0000:1a:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
  • Dell LifeCycle Logs may report a PCI1318 Bus Fatal Error:
PCI1318: A fatal error was detected on a component at bus <X> device <Y> function <Z>
  • The issue may also be transient in nature:
    • GPUs may show up again after an AC power cycle
    • The number and slots where GPUs disappear may change.

Cause

Xid and Bus Fatal Error messages are often related to PCIe Retimer-related errors. These errors can happen due to signal integrity issues within the PCIe architecture.

Resolution

Dell has released updated GPU accelerator firmware. This update includes enhancements to the PCIe Retimers, which improve signal integrity and help prevent GPUs from falling off the bus. As a first step, it is advisable to update to the firmware listed below or newer if there are bus fatal errors.
 

GPU Accelerator Bundle Version Retimer Version
H100 20.24.07.10 - FW 1.5.0 2.10.42
H200 20.24.07.10 - FW 1.5.0 2.10.42
H800 TBD* TBD*
H20 TBD* TBD*

 

IMPORTANT:
The server must be Powered ON for the firmware update to apply.
An AC REBOOT or Virtual AC Power Cycle or Full Power Cycle is required after completion for the firmware changes to take effect.

Affected Products

PowerEdge XE9680
Article Properties
Article Number: 000230721
Article Type: Solution
Last Modified: 16 Mar 2026
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.