PowerFlex VMware Replication Causes High CPU Utilization And IO Errors

Summary: When running VMware Replication with PowerFlex SDCs, the ESXi host experiences high utilization and IO errors During the initial replication of VMs with VMware Replication using a PowerFlex cluster, the ESXi host sees high CPU utilization and IO errors. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

 - VMware Replication 8.4 and below
 - Initial replication on a VM or VMs
 - Replicated VM has many VMDK disks (15+)
 - High CPU utilization on the ESXi host where the VM is hosted when replication begins.
 - Latency on mapped volumes from PowerFlex cluster increases in latency to 20-30 ms, possibly more.
 - Other VMs on the same host that is not being replicated may see decreased performance and/or IO errors from the application perspective.
 - A view of disk queues with "esxtop" shows that the host is queuing IO calls to the backend volumes.
 - The backend components (MDM/SDS) are solid and do not show any performance issues or errors.

 - ESXi host with replicating VMs has these messages shortly after replication begins:

2021-05-19T17:58:08.413Z cpu70:2098596)WARNING: ScsiDeviceIO: 1564: Device eui.1309fbc714390806ba291d4e0000001b performance has deteriorated. I/O latency increased from average value of 796 microseconds to 25965 microseconds.
2021-05-19T17:58:10.048Z cpu70:2098596)WARNING: ScsiDeviceIO: 1564: Device eui.1309fbc714390806ba2944570000005d performance has deteriorated. I/O latency increased from average value of 799 microseconds to 26019 microseconds.
2021-05-19T17:58:12.060Z cpu70:2098596)WARNING: ScsiDeviceIO: 1564: Device eui.1309fbc714390806ba291d3d0000000a performance has deteriorated. I/O latency increased from average value of 676 microseconds to 23641 microseconds.
 

 

Impact

Performance degradation and IO errors from the application perspective

Cause

During the initial replication of a VM with VMware Replication, it does a checksum of every block for each .vmdk disk the VM has configured. During this checksum process, the IO is sent through a single thread on the ESXi host, causing the checksum IO to be serialized. This thread is also used for other IO purposes on the host, causing abnormal CPU utilization and disk latency which in turn slows down other VMs on the same host.

Resolution

VMware is fixing this in a later version of VMware Replication. The version is still TBD.

 

Affected Products

VxFlex Product Family
Article Properties
Article Number: 000203238
Article Type: Solution
Last Modified: 10 Jun 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.