Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Dell PowerEdge 14G: ESXi returns "Failed to initialize NVML: Unknown Error" with NVidia GPU

Summary: To resolve this issue, please set the Memory Mapped I/O Base setting to 512GB

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


Symptoms

Description

When trying to install an NVidia GPU (e.g. M10) into a supported 14G server (R740 and R740XD), after installing the driver vib, the following error can appear when attempting to execute the nvidia-smi command:

[root@localhost:~] nvidia-smi
Failed to initialize NVML: Unknown Error


SLN308065_en_US__1PSE2940error


In the nvidia-bug-report.log, the events similar to the following are seen in the /var/log/vmkernel.log section:

2017-11-02T18:28:19.707Z cpu45:66263)NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.73  Mon Aug 21 15:16:25 PDT 2017
2017-11-02T18:28:19.710Z cpu3:66145)NVRM: This is a 64-bit BAR mapped above 16 TB by the system
NVRM: BIOS or the VMware ESXi kernel. This PCI I/O region assigned
NVRM: to your NVIDIA device is not supported by the kernel.
NVRM: BAR1 is 256M @ 0x382fe00$



 


Solution

Hardware is working fine. To resolve this issue, please set the Memory Mapped I/O Base setting to 512GB (default is 56TB) or 12TB (if the server has >512GB RAM):

SLN308065_en_US__2PSE2940biossetting


This issue is documented in the R740 hardware owner's manual:

Memory Mapped I/O above 4 GB - Enables or disables the support for the PCIe devices that need large amounts of memory. Enable this option only for 64-bit operating systems. This option is set to Enabled by default.

Memory Mapped I/O above Base - When set to 12 TB, the system will map MMIO base to 12 TB. Enable this option for an OS that requires 44 bit PCIe addressing.
When set to 512 GB, the system will map MMIO base to 512 GB, and reduce the maximum support for memory to less than 512 GB. Enable this option only for the 4 GPU DGMA issue . This option is set to 56 TB by default.

http://topics-cdn.dell.com/pdf/poweredge-r740_owner's%20manual_en-us.pdf (page 52)

 

SLN308065_en_US__3icon Note that this will limit the system memory to 512GB (if set to 512GB).

 

Once this setting is changed and system rebooted, nvidia-smi should output something similar to:

SLN308065_en_US__4PSE2940noerror

Cause

-

Resolution

-

Article Properties


Affected Product

PowerEdge R740, PowerEdge R740XD, PowerEdge T640

Last Published Date

07 Oct 2021

Version

4

Article Type

Solution