PowerEdge: R7515 - Resolving Critical Xid Error with NVIDIA T4 GPU
Summary: This article gives the solution of PowerEdge R7515 with NVIDIA T4 GPU Detected Critical Xid Error, and GPU stopped processing.
Symptoms
PowerEdge R7515 with NVIDIA T4 GPU running an application program Detected Critical Xid Error and GPU stopped processing.
1.update the GPU CUDA and driver to the latest version, CUDA Toolkit 12: 12.2.2 NVIDIA Data Center GPU Driver: 535.129.03 (Linux), enabled GPU application resilience mode, then running user’s APP test and still have an error: Detected Critical Xid Error
2. Nvidia-bug-report log check have a lot of NVRM:Xid:8 errors, GPU Current Temp: 42 C
Nov 27 10:18:12 dell-PowerEdge-R7515 kernel: NVRM: Xid (PCI:0000:81:00): 8, pid=1020, name=Xorg, Channel 00000002
3.boot from SLI3.0 and running NVIDIA Tesla GPU 629 Field diagnostics is PASSED.
Cause
Resolution
1.Working with Xid Errors XID: Eight for before test expects the driver error, bus error, thermal issue. It seems the user App error.

2.Let the user checks the application program, then the user disabled the xorg and the issue was gone.