IDPA: IDPA deployment on DP8300 or DP8800 fails on Avamar with the error "FAILED: Configuring Avamar server. ERROR: Avamar configuration failed"
Summary: This KB walks you through IDPA Deployment failure on DP8300 or DP8800 with physical Avamar servers due to SSH issues to one or more storage nodes.
Symptoms
On the ACM UI, the following error is seen on Avamar deployment/configuration failing at 11%:
The diagnostic report on the ACM UI shows the following error message:
In the server.log, the following failure messages for Avamar configuration is seen:
2019-10-18 15:02:19,458 INFO [Thread-3605]-avadapter.ConfigAvamarServerTask$StreamWrapper: Avamar config CLI status :
2019-10-18 15:02:19,458 INFO [Thread-3605]-avadapter.ConfigAvamarServerTask$StreamWrapper: Avamar config CLI status : Available Transitions:
2019-10-18 15:02:19,458 INFO [Thread-3605]-avadapter.ConfigAvamarServerTask$StreamWrapper: Avamar config CLI status : Retry Current Task
2019-10-18 15:02:19,458 INFO [Thread-3605]-avadapter.ConfigAvamarServerTask$StreamWrapper: Avamar config CLI status : Call EMC Support
2019-10-18 15:02:19,458 INFO [Thread-3605]-avadapter.ConfigAvamarServerTask$StreamWrapper: Avamar config CLI status : Abort Workflow
2019-10-18 15:02:19,461 INFO [pool-63-thread-2]-avadapter.ConfigAvamarServerTask: Execution of Avamar config CLI completed. Exit Value = 0
2019-10-18 15:02:19,461 INFO [pool-63-thread-2]-avadapter.ConfigAvamarServerTask: Error while executing Avamar config CLI :
2019-10-18 15:02:19,462 ERROR [pool-63-thread-2]-avadapter.ConfigAvamarServerTask: Exception occurred while executing Avamar config server task. com.emc.vcedpa.common.exception.ApplianceException: Avamar configuration failed.
Review the Avamar Utility node for logs under "/usr/local/avamar/var/avi/server_log/avinstaller.log.0" and /data01/avamar/repo/<package_name>/tmp/workflow.log".
The following errors is seen on the Avamar side:
2019-10-18 21:02:18 (+0000) 4107124 INFO: The following exeception was raised while running the command "sudo -H -u root bash -c \"ssh-agent bash /usr/local/avamar/var/run_command-tmpscript.18709.4107124\" </dev/null >/usr/local/avamar/var/run_command-outer-output.18709.4107124 2>&1": #<SignalException: SIGTERM>
2019-10-18 21:02:18 (+0000) 4107124 INFO: The tmpscript at the time of the exception:
2019-10-18 21:02:18 (+0000) 4107124 INFO: ["#!/bin/bash\n", "# Temporary script\n", "[ -r /etc/profile ] && . /etc/profile\n", "export LOGNAME=root USERNAME=root\n", "ssh-add /home/admin/.ssh/dpnid && (ssh -q root@x.x.x.x env) >/usr/local/avamar/var/run_command-sysout.18709.4107124 2>&1 ; RC=$?\n", "[ -d /usr/local/avamar/var/run_command-keys.18709.4107124 ] && rm -rf /usr/local/avamar/var/run_command-keys.18709.4107124\n", "exit $RC\n"]
2019-10-18 21:02:18 (+0000) 4107124 INFO: The sysout at the time of the exception:
2019-10-18 21:02:18 (+0000) 4107124 INFO: []
2019-10-18 21:02:18 (+0000) 4107124 INFO: The outer_output at the time of the exception:
2019-10-18 21:02:18 (+0000) 4107124 INFO: ["Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)\n"]
From the above exception, it is clear that the Avamar Configuration workflow was not able to SSH to one of the nodes with IPs highlighted "x.x.x.x" which caused the pre-checks to fail on Avamar, causing the Avamar install SLES workflow to halt and the Avamar configuration to be marked as failed on ACM.
Cause
Avamar Pre-Check task in the Install SLES workflow runs a few commands and verifies it can log in successfully to all storage nodes with SSH keys loaded. In this case, it was not able to SSH to one of the nodes which caused the pre-check to workflow to halt on Avamar AVI and Avamar configuration to be marked as failed on ACM. SSH issue must be fixed for the node.
Resolution
- Log in to the Avamar Utility Node as 'admin' user.
Note: The password for Avamar nodes is set to default at this point which is "changeme".
- Load the SSH keys on the Utility node:
ssh-agent bash
ssh-add /home/admin/.ssh/dpnid
- SSH to each storage node to confirm they are all reachable and we can SSH to them.
For DP8300:
Test the following commands and make sure we can SSH to all required nodes:
From Utility Node: mapall date
ssn 0.0
ssn 0.1
ssn 0.2
For DP8800:
Test the following commands and make sure we can ssh to all required nodes:
From Utility Node: mapall date
ssn 0.0
ssn 0.1
ssn 0.2
ssn 0.3
- If any of the nodes does not return the date or throws an error running the above date command, that node is the affected node. The following output may be seen on a bad node:
dmin@xxxxxxxxxxxx:~/.ssh/> mapall date
Using /usr/local/avamar/var/probe.xml
(0.0) ssh -q -x -o GSSAPIAuthentication=no admin@x.x.x.x 'date'
Tue Nov 12 16:22:43 UTC 2019
(0.1) ssh -q -x -o GSSAPIAuthentication=no admin@x.x.x.x 'date'
Tue Nov 12 16:23:38 UTC 2019
(0.2) ssh -q -x -o GSSAPIAuthentication=no admin@x.x.x.x 'date'
In the above example, note that node 0.2 did not return the date output, confirming the issue with that node. We can manually SSH to the node as well to confirm the issue:
ssn 0.2
Using /usr/local/avamar/var/probe.xml
ssh -x admin@x.x.x.x ''
^CERROR: command "ssh -x admin@x.x.x.x ''" failed, return: 2, exitcode: 0, signal: 2, dumped core: 0
- Reboot the affected node or restart the SSHD service on this node to fix the issue.
IPMI Tool Utility can be used to remotely reboot the node using ACM.
IPMITOOL IP Addresses on each Avamar Node:
Utility Node:192.1X8.1X0.1X4
Storage Node 1:192.1X8.1X0.1X5
Storage Node 2:192.1X8.1X0.1X6
Storage Node 3:192.1X8.1X0.1X7
Storage Node 4:192.1X8.1X0.1X8
From ACM issue the following commands to reboot the affected node:
- Check Power Status via commandline
ipmitool -I lanplus -H 192.168.100.11x -U root -P Idpa_1234 power status
- Power off node via commandline
ipmitool -I lanplus -H 192.168.100.11x -U root -P Idpa_1234 power off
- Power on node via commandline
ipmitool -I lanplus -H 192.168.100.11x -U root -P Idpa_1234 power on
Note: Test manual SSH to the affected node after its rebooted. Repeat Step 3 to confirm issue is fixed.
-
On the Avamar AVI Installer UI, abort the Avamar Install workflow. On the ACM UI, hit retry to rollback and retry the configuration.
-
Retry the Configuration.