NVP-vProxy: Backups stop responding and eventually fail with resource not found code -404
Summary: The backups intermittently fail with a resource not found message in the NetWorker Management Console (NMC) and the backup session logs.
Symptoms
The NetWorker VMware Protection integration is configured with the vProxy Appliance. The backups intermittently fail with a resource not found message in the NetWorker Management Console (NMC) and the backup session logs. A subsequent backup completes for the previously affected virtual machine (VM), but it may affect other VMs. Evaluating the failed backup sessions shows that the affected VM backups are running on a specific vProxy appliance.
The VM backup session logs under /nsr/logs/policy/[policy_name]/[workflow_name] shows:
{"Text":"Resource not found: 44ab916f-7519-40f9-9d2e-e90394782ed9","Code":-404}
The backup action log under /nsr/logs/policy/[policy_name]/[workflow_name] shows:
MM/DD/YYYY HH:MM:SS [NSR_name] nsrvproxy_save NSR error [vm_name]: Unable to obtain backup status from vProxy after 5 attempts, failing the backup: Received an HTTP code: 404, libCURL message: "", vProxy message: "Error received from vProxy ="-404: Resource '2b53b0ea-dbd9-4438-ac3a-d2bea9e5242b' not found for 'GetSession' request". ", url: "https://[vproxy_name]:9090/api/v1/BackupVmSessions/2b53b0ea-dbd9-4438-ac3a-d2bea9e5242b", body: " ".
The failed backup session logs in the vProxy /opt/emc/vproxy/runtime/logs/recycle/vbackupd directory show most of the backups paused (stopped responding) at different points in the workflow. There is one backup session workflow that shows the following error message:
YYYY-MM-DD HH:MM:SS INFO: [@(#) Build number: 191] Data Mover: Moving data for 1 virtual disks, 1 at a time ...
YYYY-MM-DD HH:MM:SS INFO: [@(#) Build number: 191] Data Mover: Hard disk 1: Preparing for data movement.
YYYY-MM-DD HH:MM:SS ERROR: [@(#) Build number: 191] VDDK config call timed out after 1h0m0s, exiting 'vbackupd', PID=15426.Cause
The vProxy appliance is not able to resolve the target ESXi host, and it is causing the VDDK call to timeout after an hour.
Resolution
Ensure that the vProxy appliance can properly resolve (FQDN, shortname, IP) all the ESXi hosts in the vSphere inventory that are hosting protected VMs.