Avamar: Communications fail when the MTU is undersized
Summary: If the Maximum Transmission Unit (MTU) is undersized, it can cause Avamar communications to fail.
Symptoms
Replication jobs are unresponsive, or when configuring replication, there is no response after clicking 'Verify Authentication' in the Avamar Administrator UI.
Cause
All network communications sent out of the Avamar nodes are set up with "Do Not Fragment (DNF)".
The TCP layer assumes that an MTU of 1500 is available everywhere along the network path.
With the DNF flag set, If the MTU is less than 1500 anywhere along the path (such as a router or gateway), the packet is dropped.
Resolution
To determine if the issues are MTU-related:
1. If the Avamar Administrator UI is unresponsive when clicking 'Verify Authentication' in the Avamar Administrator UI.
Review the Management Console Server (MCS) log mcserver.log.0:
02/20-18:27:26.00857 com.avamar.mc.dpn.ExecuteCommand.exec
WARNING: avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg
--password=**************** --vardir=/usr/local/avamar/var --server=ava-internal --id=root --bindir=/usr/local/avamar/bin
--vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc --backups --account=/MC_BACKUPS
--count=1 --id=repluser --password=**************** --server=ava-repl.company.com --hfsport=27000 --conntimeout=120
--vardir=/usr/local/avamar/var
avtar Info <7977>: Starting at 2015-02-20 17:27:26 EST [avtar Jul 1 2014 18:55:49 7.0.102-47 Linux-x86_64]
avtar Info <6555>: Initializing connection (Avamar Deduplication Engine v2.0.0)
avtar Info <5552>: Connecting to Avamar Server (ava-repl.company.com)
avtar Info <5554>: Connecting to one node in each datacenter
avtar Info <5583>: Login User: "repluser", Domain: "default", Account: "/MC_BACKUPS"
avtar Info <5580>: Logging in on connection 0 (server 0)
avtar Info <5582>: Avamar Server login successful
avtar Info <7694>: Server(ava-repl.company.com) not responding (possible network congestion?) (300 seconds)
avtar Info <7694>: Server(ava-repl.company.com) not responding (possible network congestion?) (600 seconds)
avtar Info <7694>: Server(ava-repl.company.com) not responding (possible network congestion?) (900 seconds)
avtar Info <7694>: Server(ava-repl.company.com) not responding (possible network congestion?) (1200 seconds)
From this output, the avtar cannot get the output when trying to list the backups of MC_BACKUPS on the target server.
2. Review the network settings:
ifconfig |grep "Link encap\|MTU"
Example output from a multinode Gen4T grid:
bond0 Link encap:Ethernet HWaddr 00:60:16:16:16:00
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
bond1 Link encap:Ethernet HWaddr 00:60:16:16:16:02
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
eth0 Link encap:Ethernet HWaddr 00:60:16:16:16:34
UP BROADCAST MULTICAST MTU:1500 Metric:1
eth1 Link encap:Ethernet HWaddr 00:60:16:16:16:C4
UP BROADCAST MULTICAST MTU:1500 Metric:1
eth2 Link encap:Ethernet HWaddr 00:60:16:16:16:C5
UP BROADCAST MULTICAST MTU:1500 Metric:1
eth3 Link encap:Ethernet HWaddr 00:60:16:16:16:C6
UP BROADCAST MULTICAST MTU:1500 Metric:1
eth4 Link encap:Ethernet HWaddr 00:60:16:16:16:C7
UP BROADCAST MULTICAST MTU:1500 Metric:1
eth5 Link encap:Ethernet HWaddr 00:60:16:16:16:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
eth6 Link encap:Ethernet HWaddr 00:60:16:16:16:00
UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1
eth7 Link encap:Ethernet HWaddr 00:60:16:16:16:02
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
eth8 Link encap:Ethernet HWaddr 00:60:16:16:16:02
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
UP LOOPBACK RUNNING MTU:65536 Metric:1
This output confirms that all Bond and Ethernet ports have an MTU of 1500.
3. Perform ping tests:
ping -M do -s <size> <destination>
-
- "-M do" prohibits fragmentation
- "-s <size>" is the packet size
- "<destination>" is, in this scenario, the Avamar replication target grid.
Example of a successful ping:
ping -M do -s 1430 ava-repl.company.com
PING system (10.x.xxx.xxx) 1430(1458) bytes of data.
1438 bytes from system (10.x.xxx.xxx): icmp_seq=0 ttl=63 time=11.3 ms
(1458 - 28 = 1430)
Example of an unsuccessful ping:
ping -M do -s 1472 ava-repl.company.com
PING ava-repl.company.com (10.x.xxx.xxx) 1472(1500) bytes of data.
From host (10.x.xxx.xxx) icmp_seq=0 Frag needed and DF set (mtu = 1462)
From host (10.x.xxx.xxx) icmp_seq=0 Frag needed and DF set (mtu = 1462)
--- ava-repl.company.com ping statistics ---
0 packets transmitted, 0 received, +2 errors
(1500 - 28 = 1472)
4. Repeat the test above to determine the maximum packet size.
5. If the MTU is undersized, the local network team must be engaged to address the issue.
If MTUs issue cannot be resolved, create a service request to engage the Dell Technologies Avamar Support team to see if there is a workaround available.