Dell Automation: Troubleshooting NativeEdge Detection of Multiple NVIDIA GPUs in UEFI VMs

Summary: This article explains an issue where the kernel drivers fail to detect one of the NVIDIA Graphics Processing Units (GPU) in a Unified Extensible Firmware Interface (UEFI) Virtual Machine (VM) configured with multiple GPU passthroughs. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

The issue occurs when using Ubuntu 22.04 and passing two or more GPUs to the VM.

This issue occurs when the system allocates only 64 GB of MMIO space, which is insufficient for the required configuration. This value is not enough to support two or more GPUs, resulting in a failed BAR allocation.

The symptoms of this issue include:

  • The kernel drivers do not detect one of the NVIDIA GPUs
  • The dmesg log shows an error message indicating that the PCI I/O region assigned to the NVIDIA device is invalid
  • The lspci -nnk output shows that the kernel driver is not in use for the affected GPU

Example of dmesg output for failing device for reference:

[  268.756466] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:NVRM: BAR0 is 0M @ 0x0 (PCI:0000:09:00.0)
[  268.756481] nvidia: probe of 0000:09:00.0 failed with error -1
[  268.756592] NVRM: The NVIDIA probe routine failed for 1 device(s).

Cause

The cause of this issue is the default MMIO space allocation of 64GB, which is not sufficient to support two or more GPUs. In UEFI VMs, the OVMF firmware sets a default MMIO space of 32GB if it is not overwritten.

The affected GPU, L4, occupies a BAR size of 32GB, which exceeds the default allocation, resulting in a failed BAR allocation.

Resolution

The issue is resolved in the NativeEdge 4.2 release, which includes a revamped logic to determine the MMIO space size dynamically. As a workaround, users can switch to BIOS legacy mode.

Also, a future update allows users to overwrite the MMIO space size if there is further issues.

Users running NativeEdge 4.1 release are advised to upgrade to NativeEdge 4.2 release to resolve this issue.

Affected Products

Dell Automation Platform, NativeEdge Solutions, Dell Automation Platform Components, NativeEdge
Article Properties
Article Number: 000436173
Article Type: Solution
Last Modified: 04 شوال 1447
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.