跳转至主要内容
  • 快速、轻松地下订单
  • 查看订单并跟踪您的发货状态
  • 创建并访问您的产品列表
  • 使用“Company Administration”(公司管理),管理Dell EMC站点、产品和产品级联系人。

Dell PowerEdge 14G: ESXi returns "Failed to initialize NVML: Unknown Error" with NVidia GPU

摘要: To resolve this issue, please set the Memory Mapped I/O Base setting to 512GB

本文可能已自动翻译。如果您对其质量有任何反馈,请使用此页面底部的表单告知我们。

文章内容


症状

Description

When trying to install an NVidia GPU (e.g. M10) into a supported 14G server (R740 and R740XD), after installing the driver vib, the following error can appear when attempting to execute the nvidia-smi command:

[root@localhost:~] nvidia-smi
Failed to initialize NVML: Unknown Error


SLN308065_en_US__1PSE2940error


In the nvidia-bug-report.log, the events similar to the following are seen in the /var/log/vmkernel.log section:

2017-11-02T18:28:19.707Z cpu45:66263)NVRM: loading NVIDIA UNIX x86_64 Kernel Module  384.73  Mon Aug 21 15:16:25 PDT 2017
2017-11-02T18:28:19.710Z cpu3:66145)NVRM: This is a 64-bit BAR mapped above 16 TB by the system
NVRM: BIOS or the VMware ESXi kernel. This PCI I/O region assigned
NVRM: to your NVIDIA device is not supported by the kernel.
NVRM: BAR1 is 256M @ 0x382fe00$



 


Solution

Hardware is working fine. To resolve this issue, please set the Memory Mapped I/O Base setting to 512GB (default is 56TB) or 12TB (if the server has >512GB RAM):

SLN308065_en_US__2PSE2940biossetting


This issue is documented in the R740 hardware owner's manual:

Memory Mapped I/O above 4 GB - Enables or disables the support for the PCIe devices that need large amounts of memory. Enable this option only for 64-bit operating systems. This option is set to Enabled by default.

Memory Mapped I/O above Base - When set to 12 TB, the system will map MMIO base to 12 TB. Enable this option for an OS that requires 44 bit PCIe addressing.
When set to 512 GB, the system will map MMIO base to 512 GB, and reduce the maximum support for memory to less than 512 GB. Enable this option only for the 4 GPU DGMA issue . This option is set to 56 TB by default.

http://topics-cdn.dell.com/pdf/poweredge-r740_owner's%20manual_en-us.pdf (page 52)

 

SLN308065_en_US__3icon Note that this will limit the system memory to 512GB (if set to 512GB).

 

Once this setting is changed and system rebooted, nvidia-smi should output something similar to:

SLN308065_en_US__4PSE2940noerror

原因

-

解决方案

-

文章属性


受影响的产品

PowerEdge R740, PowerEdge R740XD, PowerEdge T640

上次发布日期

07 10月 2021

版本

4

文章类型

Solution