PowerProtect: Virtual Machine loses network connectivity during backup execution
Summary: The backup causes the Virtual Machine (VM) to lose network connectivity for approximately one minute.
Symptoms
The PowerProtect Data Manager is used to protect the vSphere environment with the VM Direct Engine solution. There is a virtual machine (VM) with multiple terabyte (TB) disks configured for protection. During the scheduled backup window, the VM intermittently loses network connectivity for approximately one minute. This causes some application connections to timeout. The backup session log does not show any errors during the snapshot creation:
... YYYY-MM-DD 19:00:14.890Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] Checking for old snapshots to cleanup ... YYYY-MM-DD 19:00:14.890Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] No pre-existing snapshots found. YYYY-MM-DD 19:00:14.890Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] Sending status 'Creating snapshot of virtual machine ...' ... YYYY-MM-DD 19:00:14.890Z INFO: [b721339cb9a6e1fa;9cb66c72289daafc] Creating snapshot of virtual machine ... YYYY-MM-DD 19:00:14.890Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] Sending create snapshot request to Snapshot Manager ... YYYY-MM-DD 19:00:42.423Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] Sending state of 'Running' (last: state=Queued, progress=0). YYYY-MM-DD 19:01:12.433Z TRACE: [b721339cb9a6e1fa;9cb66c72289daafc] Sending state of 'Running' (last: state=Running, progress=0). YYYY-MM-DD 19:01:14.361Z INFO: [b721339cb9a6e1fa;9cb66c72289daafc] Create snapshot request succeeded. YYYY-MM-DD 19:01:14.364Z INFO: [b721339cb9a6e1fa;9cb66c72289daafc] Found snapshot "snapshot-16906". ...
The network connectivity loss also includes remote desktop session and ping response drops:
Cause
The VM Direct Engine solution requires a VM snapshot at the start of the backup workflow. The VM snapshot request runs using a VMware Simple Object Access Protocol (SOAP) call to the vCenter server. The vCenter then works with the hosting ESXi server to perform the VM snapshot.
The VM vmware*log file shows that the snapshot request had to stun the VM for 59,022,933 microseconds, which converts to 59.022933 s.
YYYY-MM-DD 19:01:14.087Z| vcpu-0| I005: CPT: vm was stunned for 59022933 us
The stun is by design for the VMware vSphere Snapshot creation to flush any disk changes in memory to disk. In most instances, the stun is small enough where it should not cause any network drops in the environment.
Resolution
The affected VM protection mechanism stun is by design during the VM snapshot creation.
Workaround:
The PowerProtect Data Manager VMware Protection solution offers the Transparent Snapshot Data Mover (TSDM) solution, which does not require a VM snapshot to perform the backup. The TSDM solution uses a Lightweight Delta (LWD) filter, at the datastore level, to track the change block tracking differences since the last backup. The affected VM may be protected with the TSDM solution to prevent the stun during the backup. The PowerProtect Data Manager Virtual Machine User Guide "Protecting Virtual Machines Using the Transparent Snapshot Data Mover" section provides an overview of the protection mechanism.
Additional Information
If additional recommendations are required to minimize the VM snapshot stun time, then a Broadcom Support may be opened for more feedback.