PowerEdge: NVIDIA L-series GPUs may fall offline during runtime operations.

Summary: NVIDIA L-series GPUs may fall offline during runtime operations when running vGPU Grid 17.2 drivers.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

During runtime, virtual machines(VMs) connected to vGPU instances may experience a Blue Screen error. The following behavior can be seen at the host level:

  • Nvidia-Bug report or nvidia-smi -q may report "System is not in ready state."
  • iDRAC UI reports the NVIDIA GPU is unavailable.

System Overview GPUs

The following GPUs are impacted:

  • L40
  • L40s
  • L4

 

Cause

The issue is related to the NVIDIA Grid driver version 17.2.

Resolution

This issue can be resolved by updating to NVIDIA Grid driver to 17.4 or higher.

NOTE: A Grid package update consists of multiple component updates that must all be updated. When updating to Grid 17.4 or higher, ensure that the ESXi host driver, ESXi host management daemon, and Guest OS drivers are all updated.

 

 

 

 

Affected Products

PowerEdge R750, PowerEdge R750XA, PowerEdge R7515, PowerEdge R7525, PowerEdge R760, PowerEdge R760XA, PowerEdge R7615, PowerEdge R7625, PowerEdge R770, PowerEdge R7715

Products

PowerEdge R7725
Article Properties
Article Number: 000250524
Article Type: Solution
Last Modified: 17 Apr 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.