Data corruption may occur on R6415, R7415, and R7425 with PERC H330 running Linux based and VMware operating systems
Dell PowerEdge - R7425, R6415,R7415 14G AMD servers + Linux OS + H330 controller or ESXi + H330 configured as VMDirectPath I/O pass-through to a Linux VM, may experience data corruption,data corruption, Linux, IOMMU, AMD, filesystem
Summary:
Dell PowerEdge - R7425, R6415,R7415 14G AMD servers + Linux OS + H330 controller or ESXi + H330 configured as VMDirectPath I/O pass-through to a Linux VM, may experience data
corruption,data corruption, Linux, IOMMU, AMD, filesystem
...
This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.
Article Content
Symptoms
Under the following conditions, you may experience data corruption while carrying out heavy I/O on storage attached to a PERC H330 controller on 14th generation PowerEdge AMD-based servers:
H330 installed with Linux-based OS with the CPU Virtualization Technology (VT) function enabled in the system BIOS.
VMware operating systems/ESXi with H330 storage controller configured as a VMDirectPath I/O pass-through device(PCI-Passthrough) to a Linux Virtual Machine (VM). The risk of data corruption is only exposed to the VM that has the H330 connected as a passthrough device.
What is affected?
All 14G AMD servers (Single or Dual Processor)
R6415
R7415
R7425
Linux-Based Operating Systems including but not limited to
Red Hat Enterprise Linux 7.5
Red Hat Enterprise Linux 7.6
Ubuntu 16.04
Ubuntu ® 18.04 LTS LTS
CentOS 7.5
CentOS 7.6
SLES 12 SP3/SP4
SLES 15
All current versions of ESXi hypervisor
ESXi 6.5.x
ESXi 6.7.x
Storage controller:
PERC H330 in RAID or Non-RAID mode
Summary: You will only meet this issue with specific config.
14G AMD server + Linux OS + H330 controller
14G AMD server + ESXi + H330 configured as VMDirectPath I/O pass-through to a Linux VM
What is not affected?
14G Intel Platforms
Any storage controller (HBA330/H730/H740/H840, etc.) other than H330
Windows operating systems
Cause
-
Resolution
Do not replace hardware.
Dell EMC engineering is aware of the issue, and a BIOS workaround is made available via BIOS 1.8.7 version or greater.
Dell recommends that you update the BIOS to 1.8.7 or later.
A kernel fix is also in progress by Linux vendors and VMWare. Once an updated kernel package is available from the Linux vendors and from VMWare, it may provide an alternative solution to this problem. Dell attempts to note information regarding the fixes from Linux vendors and VMWare here as they become available.
Linux AMD_IOMMU driver uses the same memory range BIOS reserved for H330 for both I/O data buffer and I/O virtual address for accessing different physical memory area resulting in file system corruption. Additionally, IVRS Table in BIOS provides the starting address and length of the exclusion range for H330. While AMD IOMMU Driver is Setting up exclusion range, the Driver is adding the IVRS provided starting address and length to get the ending address that it uses to program the exclusion range limit register in the IOMMU, but to get the ending address that it should add the length to the starting address and subtract one, which results in the exclusion range excluding one page extra past the end of the BIOS specified exclusion range. If Kernel uses this extra page address as IOVA, then it leads to data corruption.
VMware/ESXi: Configuring a VM to use H330 controller in a VMDirectPath I/O mode may result in storage and memory corruption for the said VM