DPE vApp Backups Fail with Error "vmwappimage Error <19591>" Due to Load Balancer Issues
Summary: DPE vApp backups fail with error "vmwappimage Error <19591>: httpPost: http_code: 500 sending to url 'https://localhost:8080/vcp-ba-vappplugin-ws/vapp'" due to load balancer issues. Load balancers like CloudFlare load balancer are known to cause HTTP 524 Origin Time-out error causing the backups to fail. ...
Symptoms
The backup log may show the following error message:
2021-02-08 10:02:50 vmwappimage Info <19594>: httpPost: url https://localhost:8080/vcp-ba-vappplugin-ws/vapp 2021-02-08 10:03:22 vmwappimage Error <19591>: httpPost: http_code: 500 sending to url 'https://localhost:8080/vcp-ba-vappplugin-ws/vapp' 2021-02-08 10:03:22 vmwappimage Info <9772>: Starting graceful (staged) termination, Prep-for-backup message to ADS failed (wrap-up stage) 2021-02-08 10:03:22 vmwappimage Error <0000>: Prep-for-backup message to ADS failed 2021-02-08 10:03:22 vmwappimage Info <19594>: httpPost: url https://localhost:8080/vcp-ba-vappplugin-ws/vapp 2021-02-08 10:03:45 vmwappimage Error <19591>: httpPost: http_code: 500 sending to url 'https://localhost:8080/vcp-ba-vappplugin-ws/vapp' 2021-02-08 10:03:45 vmwappimage Error <17707>: Post backup-complete message to ADS failed. 2021-02-08 10:03:45 vmwappimage Info <16038>: Final summary, cancelled/aborted 0, snapview 0, exitcode 157: miscellaneous error
The vcdsdk.log on the VPA at "/var/log/vcp/srv/vcdsdk.log" shows the following error:
> 2021-02-12 11:58:37,368 [AMQP listener 9] INFO (RestUtil.java:329) - Response - <html> > <head><title>524 Origin Time-out</title></head> > <body bgcolor="white"> > <center><h1>524 Origin Time-out</h1></center> > <hr><center>cloudflare-nginx</center> > </body> > </html>
Steps to enable Cloud API Debug:
- Log in to VCP primary node
- Edit files
/etc/vcp/srv/vcpsrv-log4j2.xml - Original contents:
<Logger name="com.vmware" level="info" additivity="true" />
Change this line to read:
<Logger name="com.vmware" level="debug" additivity="true" /> - Restart bg and srv service,
vcp-cli bg update -p <MASTER-PASSWORD> <BG-INSTANCE-NAME>
vcp-cli srv update -p <MASTER-PASSWORD> <SRV-CELL-INSTANCE-NAME>
Performing a nslookup on a vCloud Director FQDN from VPA shows multiple IP Addresses.
Customer confirms that they have a load balancer configured.
Cause
Backup Gateways trying to connect to vCloud Director are hitting the CloudFlare load balancer. The API request fails on the CloudFlare load balancer causing the issue.
The return code CloudFlare load balancer provides the following:
2021-02-12 11:58:37,368 [AMQP listener 9] INFO (RestUtil.java:329) - Response - <html> > <head><title>524 Origin Time-out</title></head> > <body bgcolor="white"> > <center><h1>524 Origin Time-out</h1></center> > <hr><center>cloudflare-nginx</center> > </body> > </html>
Resolution
-
Confirm with the customer if they have a load balancer configured between the VPA components like Backup Gateways, and so on, and the vCloud Director.
-
Add local host file entries for all docker container VMs to manually force DPE components to connect to vCloud Director using private IP address bypassing the load balancer.
-
On DPE versions below 19.4, host file entries can be created on individual DPE VMs like SRV-Cell VM, Backup Gateway VM, and so on
-
On DPE versions 19.4 and above, changes must be made on each docker container.
- Open SSH to VPA VM
- Run the following command to fetch the docker container list:
docker ps - Run the following command to enter the docker container:
docker exec -it <first 2 digits of containerid> /bin/bash - Run the following command to update the local hosts file records:
echo "IP_ADDRESS FQDN SHORTNAME" >> /etc/hosts