VxRail: VM Migration Fails with vGPU Enabled
Summary: Virtual Machine (VM) Migration fails with vGPU enabled. The error received is "Currently connected device" 'PCI device 0' uses backing 'grid_m60-2q', which is not accessible."
Symptoms
VM Migration fails with vGPU enabled.
Error:
"Currently connected device" 'PCI device 0' uses backing 'grid_m60-2q', which is not accessible".
The source ESXi node is using a NVIDIA M60 and the target ESXi node is using a NVIDIA P40. The driver is the same. They have vgpu.hotmigrate.enabled enabled as required.
See VMware article Using vMotion to Migrate vGPU Virtual Machines.(External Link)
Cause
This is not supported.
To migrate VMs between ESXi nodes, they must be using the same type of GPU. See NVIDIA documentation Virtual GPU Software User Guide.(External Link)
NVIDIA vGPU software supports VMware vSphere vMotion for VMs that are configured with a vGPU. VMware vSphere vMotion enables you to move a running virtual machine from one physical host machine to another host with little disruption or downtime. For a VM that is configured with vGPU, the vGPU is migrated with the VM to an NVIDIA GPU on the other host. The NVIDIA GPUs on both host machines must be of the same type.
If there are different types of GPUs, there are different profiles. The profile "grid_m60-2q" would not be on the target GPU P40.
Resolution
The user must ensure that the source and target ESXi nodes are using the same type GPU.
Note: If using the same type of GPU, check the NVIDIA Virtual GPU Software release notes
vGPU migration, which includes vMotion and suspend-resume, is supported only on a subset of supported GPUs. See VMware vSphere Hypervisor (ESXi) releases, and guest operating systems.
vGPU migration is disabled for a VM for which any of the following NVIDIA CUDA Toolkit features is enabled:
- Unified memory
- Debuggers
- Profilers