Avamar: VMware image restore failed "Timeout on wait for spawned restore metadata"
Summary: VMware image restores failed with "Timeout on wait for spawned restore metadata avtar process to complete"
Symptoms
VMware image restores to new virtual machine, or image restore to new disk fails with the following error:
avvcbimage Error <0000>: [IMG0011] Timeout on wait for spawned restore metadata avtar process to complete
When restoring to a new VM or disk, the normal restore workflow must perform a "mini-restore" of the VM configuration files from the user-selected backup. These files are labeled as .vmx and .vmdk descriptor files.
Once the data is restored, the proxy takes the restored VM configuration and build a new VM virtual disk. It can then proceed with the image restore of the virtual disk. The mini-restore job called a "metadata restore" has a default timeout of 5 minutes. Since the vmx restore is small (typically ~15 kb), this should only take a few seconds to restore.
However where this behavior occurs in the restore log that we see:
1. 2017-04-14 12:50:21 avvcbimage Info : Process 4486 (/usr/local/avamarclient/bin/avtar) for workorder MOD-1492174248297#99 started
2. 2017-04-14 12:55:21 avvcbimage Error : [IMG0011] Timeout on wait for spawned restore metadata avtar process to complete
3. 2017-04-14 12:59:51 avvcbimage Info : Process 4486 (/usr/local/avamarclient/bin/avtar) finished (code 0: success)
The log messages show the total time taken was 9 mins and 30 s, but eventually the metadata restore succeeded. However the image restore task has failed because the default subprocess time of 5 mins or 300 s was exceeded.
Cause
The Data Domain is configured with an interface group (ifgroup) which tells the ddboost client (avtar) to connect to an interface. This interface is being blocked between the client (proxy)
From the Data Domain command line or webpage, review dd settings.
Example settings:
sysadmin@ddve# Ifgroup Show Config Group-name Status Interfaces ---------- ------- ---------- default enabled 192.168.69.80 ---------- ------- ---------- sysadmin@ddve net show config ethV0 Link encap:Ethernet HWaddr 00:50:56:98:A6:66 inet addr:10.240.196.80 Bcast:10.240.196.255 Mask:255.255.255.0 ethV1 Link encap:Ethernet HWaddr 00:50:56:98:56:8F inet addr:192.168.69.80 Bcast:192.168.69.255 Mask:255.255.255.0 In the example above the default Ifgroup setting will cause the client to initially connect to primary/management interface of 10.240.196.80 but will see the ifgroup settings and try to connect with 192.168.69.80. RPC has a default time-out value of 180 seconds but the client will try multiple times (3 in this case) which would add up to a total of 540 seconds or 9.5 mins. After trying three times the connection will fallback to 10.240.196.80 but by this time the image restore job would have timed out. From the proxy run this series of commands to confirm the issue replacing the IP addresses as appropriate:
rpcinfo -p 192.168.69.80 (command stops responding for 3 min)rpcinfo -p <data domain fully qualified hostname>rpcinfo -p 10.120.24.49
Resolution
Resolution:
Open up the connection between client and Data Domain ifgroup interfaces.
Another option is to disable the ifgroup.
Alternative workaround:
Increase the timeout value for the subprocess, for example, to 10 mins (600 s). This allows the restore to complete.
To do this, add a parameter --subprocesstimeoutsecs=600 to the avvcbimageAll.cmd file, located in /usr/local/avamarclient/var/ on the proxy
For more information about editing the avvcmimage.cmd file, see
Additional Information
Related article: Avamar - VMware Image restore failed with FATAL error 'GetDiskAttributed Failed' (may require login to Dell Support)