PowerFlex VMware ATS Miscompare Errors When Deploying Multiple VMs
Summary: Virtual machine (VM) deployment from a Content Library (CL) takes a long time. ATS miscompare errors are seen in the vmkernel.log.
Instructions
Scenario
This situation may occur while deploying multiple VMs on a single datastore simultaneously, or running any other vSphere operation which requires extensive locking on a virtual machines file system (VMFS) datastore.
For example:
- Creating, growing, or locking a virtual machine file
- Changing attributes of a file
- Powering a virtual machine on or off
- Creating or deleting a VMFS datastore
- Expanding a VMFS datastore
- Creating a template
- Deploying a virtual machine from a template
- Migrating a virtual machine with vMotion
- And so on
Symptoms
vmkernel.log shows a high number of similar errors against a datastore:
2021-09-10T09:59:39.080Z cpu27:2098773)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x89 (0x459b0a7d8500, 3268605) to dev "eui.768dd94c75fbb70f058911e70000001f" on path "vmhba64:C0:T0:L31" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0. Act:NONE 2021-09-10T09:59:39.080Z cpu27:2098773)ScsiDeviceIO: 3483: Cmd(0x459b0a7d8500) 0x89, CmdSN 0x1c73c4e from world 3268605 to dev "eui.768dd94c75fbb70f058911e70000001f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.
Impact
Planned operations may work slower than expected.
Root Cause
This is working as designed. Certain VMFS operations (VM creating, powering on, and so on) require putting a lock on a datastore in order to update the VMFS metadata. When such operations are happening simultaneously, they must compete for the datastore access which may delay them. If ATS-locking is enabled, only a part of a datastore is locked (unlike if a legacy SCSI reservation mechanism is used), but the result is the same, the same types of operations try to get a lock of the same metadata and compete for access.
Workaround
There is no workaround, this is working as designed.
Impacted Versions
This is not a PowerFlex issue.