Avamar: How to test if ASCD is working correctly
Summary: If the Avamar Services Control Daemon (ASCD) service is not functioning correctly, issues related to maintenance, licensing, or Data Domain integration may occur.
Symptoms
Examples of Issues that may arise when there are issues with Avamar Services Control Daemon (ASCD):
-
First-time connections from new clients are failing.
-
During a restart of an Avamar grid, these errors (or similar) are seen in the
/data01/cur/gsan.logon a data node:2010/09/30-17:17:05.79301 {0.1} [licensevalidator:177] ERROR: <0001> licensevalidator::validate utility node cannot be reached 2010/09/30-17:17:05.79312 {0.1} [licensevalidator:177] FATAL ERROR: <0001> licensevalidator::body no valid license found-- Or --
2010/12/19-01:17:39.49253 {0.0} [licensevalidator:643] ERROR: <0001> licensevalidator::validate license server cannot be contacted 2010/12/19-01:17:39.49266 {0.0} [licensevalidator:643] FATAL ERROR: <0001> licensevalidator::body no valid license found
- Checkpoint validation (
Hfscheck) is failing with the same error messages as above but in the/data01/hfscheck/gsan.logon the data nodes.
Cause
The Avamar Server Connection Daemon (ASCD) is a process that runs on the utility node.
The first time a client (avtar, avmaint, or avmgr) connects to the Avamar server it connects to the utility node. This is because the work order or flag file (depending on the situation) typically specifies that the "--hfsaddr" or the "--server" is the fully qualified domain name of the utility node. The ASCD process accepts the connection and redirects the connection to one of the data nodes. When the client connects to the first data node, the GSAN process on the data node sends the client a full list of data nodes on the grid. This is how the .dtb, a plain text file, is created in the var directory of the client.
The dtb file list has details of the Avamar server such as the utility node address, the hfscreate time, and the IP addresses of each data node. The name of the .dtb file corresponds to the "--hfsaddr" or "--server" and the unencrypted "--hfsport", which by default is 27000. As an example, if the "--hfsaddr" or "--server" is set to "avmxxxx", then the .dtb file name is "avmxxxx-27000.dtb".
On subsequent connections, every time the "--hfsaddr" or "--server" is specified to be, in this example, "<servername>", the avtar attempts to read the .dtb file whose name is " <servername> -27000.dtb", and uses the list of data nodes in this file to bypass the ASCD process and connect directly to a randomly selected data node from the .dtb file. If the node selected does not respond it cycles through the data nodes until it finds one that does respond. If none of the nodes respond, it then contacts the utility node to see if the dtb file must be updated."
If the "--hfsaddr" or "--server" is changed, for example, from the fully qualified domain name (FQDN) to the hostname, or to the IP address of the utility node, and there is no corresponding .dtb file that is appropriately named, then the client does not use the .dtb file and connects to the specified "--hfsaddr" or "--server". The process for the "first time" connection then repeats and a new .dtb file is created.
ASCD is also responsible for checking whether a valid license is installed.
So, if ascd is not running correctly, the license may not validate or connections may not correctly be made to the Avamar server.
Resolution
To confirm if ASCD is working correctly:
1. Log in to the Avamar server as admin. See Avamar: How to Log in to an Avamar Server and Load Various Keys for more information.
2. Check if the ascd process is running. This checks to see if the process has started:
ps -eaf | grep ascd
admin 29044 1 0 15:01 ? 00:00:00 ascd.bin --hfsport=27000 --probefile=/usr/local/avamar/var/probe.xml --sysdir=/usr/local/avamar/etc --windir=/usr/local/avamar/etc/gsan-maint-scripts --keyfile=/usr/local/avamar/etc/license.xml --scanfile=/usr/local/avamar/etc/serverlogscanners.xml
3. If there are no ASCD processes running, go to "Steps to resolve ASCD issues - Starting ASCD". If ASCD processes are running, g to step 4 to confirm that it is working correctly by testing functionality.
4. On the Avamar Utility Node, run the following command as admin:
avmaint datacenterlist --debug
The output from the above command shows that avmaint loads a .dtb file:
...
2010/10/02-13:41:17.27833 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-13:41:17.27833 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
avmaint Info <5561>: - - Loaded dispatcher table /usr/local/avamar/var/<servername>-27000.dtb
avmaint Info <5562>: - - Connect: Trying 10.x.x.x:27000
...
The command did not attempt to contact the ascd service as it already had a dtb file.
The dtb file must be removed to force the command to contact ASCD on the utility node:
a. As admin, move the specified dtb file:
mv /usr/local/avamar/var/<servername>-27000.dtb /usr/local/avamar/var/<servername>-27000.dtb.backup
b. Rerun the command:
avmaint datacenterlist --debug
If the ASCD service is hung, then ASCD will not redirect the avmaint process to one of the data nodes.
If the ASCD service is hung, "Steps to resolve ASCD issues - Forcefully starting ASCD"
Example of a hung ASCD service:
2010/10/02-13:59:45.31522 [avmaint] sock::libinit(enc="proprietary", encrypt_strength="medium", verify=0) socktype="sock_tcp"
2010/10/02-13:59:45.31534 [avmaint] sockimpl::libinit(usessl=0, verify=0, encrypt_strength="medium", pemdir="")
2010/10/02-13:59:45.31569 [avmaint] maint::command::connect addr=10.x.x.100:27000 expert=false
2010/10/02-13:59:45.31575 [avmaint] dpnconnect::create((none), 2, -1)
2010/10/02-13:59:45.31578 [avmaint] dpnconnect::create hfsaddr=<servername>
2010/10/02-13:59:45.31582 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-13:59:45.31587 [avmaint] filedisk::doopen, fdesc == -1 after open attempt
2010/10/02-13:59:45.31597 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
avmaint Info <6550>: - - Requesting address from 10.x.x.100:27000
2010/10/02-13:59:45.31614 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:45.31626 [socktimer] Startup, pid=28831 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:45.31635 [avmaint] back from connect
2010/10/02-13:59:45.31639 [socktimer] Thread finished socktimer
2010/10/02-13:59:45.31644 [avmaint] res = -1, expired = 0
avmaint Info <5562>: - - Connect: Trying 10.x.x.100:27000
2010/10/02-13:59:45.31655 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
2010/10/02-13:59:45.31662 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:45.31668 [socktimer] Startup, pid=28832 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:45.31675 [avmaint] back from connect
2010/10/02-13:59:45.31680 [socktimer] Thread finished socktimer
2010/10/02-13:59:45.31684 [avmaint] res = -1, expired = 0
avmaint Info <5694>: - Failed initial handshake, trying again
2010/10/02-13:59:50.31651 [avmaint] dpnconnect::create((none), 2, -1)
2010/10/02-13:59:50.31653 [avmaint] dpnconnect::create hfsaddr=<servername>
2010/10/02-13:59:50.31655 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-13:59:50.31657 [avmaint] filedisk::doopen, fdesc == -1 after open attempt
2010/10/02-13:59:50.31663 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
avmaint Info <6550>: - - Requesting address from 10.x.x.100:27000
2010/10/02-13:59:50.31669 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:50.31679 [socktimer] Startup, pid=28833 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:50.31687 [avmaint] back from connect
2010/10/02-13:59:50.31692 [socktimer] Thread finished socktimer
2010/10/02-13:59:50.31696 [avmaint] res = -1, expired = 0
avmaint Info <5562>: - - Connect: Trying 10.x.x.100:27000
2010/10/02-13:59:50.31707 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
2010/10/02-13:59:50.31712 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:50.31716 [socktimer] Startup, pid=28834 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:50.31722 [avmaint] back from connect
2010/10/02-13:59:50.31727 [socktimer] Thread finished socktimer
2010/10/02-13:59:50.31730 [avmaint] res = -1, expired = 0
avmaint Info <5694>: - Failed initial handshake, trying again
2010/10/02-13:59:55.31667 [avmaint] dpnconnect::create((none), 2, -1)
2010/10/02-13:59:55.31669 [avmaint] dpnconnect::create hfsaddr=<servername>
2010/10/02-13:59:55.31671 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-13:59:55.31674 [avmaint] filedisk::doopen, fdesc == -1 after open attempt
2010/10/02-13:59:55.31679 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
avmaint Info <6550>: - - Requesting address from 10.x.x.100:27000
2010/10/02-13:59:55.31685 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:55.31696 [socktimer] Startup, pid=28835 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:55.31703 [avmaint] back from connect
2010/10/02-13:59:55.31709 [socktimer] Thread finished socktimer
2010/10/02-13:59:55.31713 [avmaint] res = -1, expired = 0
avmaint Info <5562>: - - Connect: Trying 10.x.x.100:27000
2010/10/02-13:59:55.31724 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
2010/10/02-13:59:55.31729 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-13:59:55.31733 [socktimer] Startup, pid=28836 tid=32 stack=0x409fffaf td=0x2a9673d080
2010/10/02-13:59:55.31740 [avmaint] back from connect
2010/10/02-13:59:55.31746 [socktimer] Thread finished socktimer
2010/10/02-13:59:55.31751 [avmaint] res = -1, expired = 0
avmaint Info <5694>: - Failed initial handshake, trying again
ERROR: avmaint: datacenterlist: cannot connect to server <servername> at 10.x.x.100:27000
-
-
- In this example,
avmaintrepeatedly tries to connect to 10.x.x.100:27000 where the utility node is listening. - It is supposed to respond on the port on which the ASCD process is listening, but since the ASCD process is hung, ASCD does not respond.
- Eventually, the
avmaintcommand fails.
- In this example,
-
Example of a working ASCD service:
2010/10/02-14:02:21.37572 [avmaint] sock::libinit(enc="proprietary", encrypt_strength="medium", verify=0) socktype="sock_tcp"
2010/10/02-14:02:21.37583 [avmaint] sockimpl::libinit(usessl=0, verify=0, encrypt_strength="medium", pemdir="")
2010/10/02-14:02:21.37619 [avmaint] maint::command::connect addr=10.x.x.100:27000 expert=false
2010/10/02-14:02:21.37625 [avmaint] dpnconnect::create((none), 2, -1)
2010/10/02-14:02:21.37628 [avmaint] dpnconnect::create hfsaddr=<servername>
2010/10/02-14:02:21.37632 [avmaint] tabfilename = /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-14:02:21.37639 [avmaint] filedisk::doopen, fdesc == -1 after open attempt
2010/10/02-14:02:21.37651 [avmaint] tcpsock::createclient ipaddr:10.x.x.100:27000
avmaint Info <6550>: - - Requesting address from 10.x.x.100:27000
2010/10/02-14:02:21.37668 [avmaint] connect timeout = 60 ipaddr:10.x.x.100:27000 netbind:
2010/10/02-14:02:21.37680 [socktimer] Startup, pid=29074 tid=32 stack=0x409fffaf td=0x2a9673d088
2010/10/02-14:02:21.37693 [avmaint] back from connect
2010/10/02-14:02:21.37698 [socktimer] Thread finished socktimer
2010/10/02-14:02:21.37702 [avmaint] res = 0, expired = 0
2010/10/02-14:02:21.37729 [avmaint] dpnconnect::getserveraddr trying 10.x.x.100:27000
2010/10/02-14:02:21.37755 [avmaint] dpnconnect::getserveraddr received newaddr=10.00.00.102:27000 srcaddr=10.x.x.100:27000
avmaint Info <5562>: - - Connect: Trying 10.x.x.102:27000
2010/10/02-14:02:21.37772 [avmaint] tcpsock::createclient ipaddr:10.x.x.102:27000
2010/10/02-14:02:21.37778 [avmaint] connect timeout = 60 ipaddr:10.x.x.102:27000 netbind:
2010/10/02-14:02:21.37785 [socktimer] Startup, pid=29076 tid=32 stack=0x409fffaf td=0x2a9673d088
2010/10/02-14:02:21.37800 [avmaint] back from connect
2010/10/02-14:02:21.37805 [socktimer] Thread finished socktimer
2010/10/02-14:02:21.37809 [avmaint] res = 0, expired = 0
avmaint Info <5993>: - Connect: Connected to 10.x.x.102:27000, Priv=2, TCP
2010/10/02-14:02:21.37941 [avmaint] getdpninfo: servtime=1286028141.3800 clienttime=1286028141.3793 catserver=0 compressed=1 dpmmaxmessage=405
2010/10/02-14:02:21.37949 [avmaint] dpnconnect::getdpninfo serverversion=5.0.1-32 hfscreatetime=1275056884 maxmessagedatasize=65412 maxatomchunksize=65382 maxcompchunksize=32691
avmaint Info <5564>: - - Requesting dispatcher table and create time
avmaint Info <5566>: - - Table: 8 nodebits, 26 bytes, create time: 1275056884
2010/10/02-14:02:21.37997 [avmaint] checkrightdpn: olddc=0 newdc=1 numnodes=3 oldcreatetime=1275056884 newcreatetime=1275056884
avmaint Info <5568>: - Writing dispatcher table /usr/local/avamar/var/<servername>-27000.dtb
2010/10/02-14:02:21.38025 [avmaint] maint::command::connect socket=tcp servaddr=10.x.x.102:27000 fd=3 bufsize=60000 isclosed=0
2010/10/02-14:02:21.38121 [avmaint] maint::command::connect hasstatusexts=1
2010/10/02-14:02:21.38229 [avmaint] maint::command::execute setting DB to not flush streams
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<datacenterlist count="1">
<datacenterinfo
id="0"
timestamp="1286028141"
nnodes="3"
ismaster="true"
naddrs="1"
valid="true">
<gateway ipaddr="10.x.x.100"/>
</datacenterinfo>
</datacenterlist>
-
-
- In this example, the ASCD process redirected the
avmaintcommand to one of the data nodes (10.x.x.102) - When the
avmaintsuccessfully connects to the data node, a newdtbfile is created:
- In this example, the ASCD process redirected the
-
avmaint Info <5568>: - Writing dispatcher table /usr/local/avamar/var/<servername>-27000.dtb
Steps to Resolve ASCD Issues - Restarting ASCD:
If the ASCD process is not running, it must be restarted:
1. As admin, from the /home/admin directory on the Avamar Utility Node, type the following:
ascd start
dpnctl stop) cannot shut down the ASCD process.
2. Ensure that the ASCD process has started as an admin-owned process:
ps -eaf | grep ascd
3. Check the /usr/local/avamar/var/ascd-27000.log file to ensure that ASCD has properly started. (If a custom unencrypted port other than 27000 has been specified, then check the "ascd-<hfsport>.log file.)
Steps to Resolve ASCD Issues - Forcefully Restarting ASCD:
If the ASCD process is in a hung state, it must be restarted forcefully:
1. As admin, from the /home/admin directory on the Avamar Utility Node, type the following:
ps -eaf | grep ascd
2. Note the process ID (PID) associated with ASCD.
3. Kill the process:
kill -9 <ascd_pid>
4. Ensure that the process has stopped:
ps -eaf | grep ascd
5. As admin, from the /home/admin directory on the Avamar Utility Node, type the following:
ascd start
dpnctl stop) cannot shut down the ASCD process.
6. Ensure that the ASCD process has started as an admin-owned process:
ps -eaf | grep ascd
7. Check the /usr/local/avamar/var/ascd-27000.log file to ensure that ASCD has properly started. (If a custom unencrypted port other than 27000 has been specified, then check the "ascd-<hfsport>.log file.)
avmaint datacenterlist --debug) and confirm that the ASCD is responding correctly.