PowerEdge: R7515 - Resolving Critical Xid Error with NVIDIA T4 GPU

Summary: This article gives the solution of PowerEdge R7515 with NVIDIA T4 GPU Detected Critical Xid Error, and GPU stopped processing.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

PowerEdge R7515 with NVIDIA T4 GPU running an application program Detected Critical Xid Error and GPU stopped processing.
PowerEdge R7515 with NVIDIA T4 GPU running an application program Detected Critical Xid Error and GPU stopped processing.

1.update the GPU CUDA and driver to the latest version, CUDA Toolkit 12: 12.2.2 NVIDIA Data Center GPU Driver: 535.129.03 (Linux), enabled GPU application resilience mode, then running user’s APP test and still have an error: Detected Critical Xid Error
2. Nvidia-bug-report log check have a lot of NVRM:Xid:8 errors, GPU Current Temp: 42 C

Nov 27 10:18:12 dell-PowerEdge-R7515 kernel: NVRM: Xid (PCI:0000:81:00): 8, pid=1020, name=Xorg, Channel 00000002


3.boot from SLI3.0 and running NVIDIA Tesla GPU 629 Field diagnostics is PASSED.
 

 

Cause

N/A

Resolution

1.Working with Xid ErrorsThis hyperlink is taking you to a website outside of Dell Technologies. XID: Eight for before test expects the driver error, bus error, thermal issue. It seems the user App error.
Xid error listings
2.Let the user checks the application program, then the user disabled the xorg and the issue was gone.

Affected Products

PowerEdge R7515
Article Properties
Article Number: 000220148
Article Type: Solution
Last Modified: 16 Jul 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.