VNX: NDMP 3-Way Backup Fails to start - Medium Error (User Correctable)
Summary: VNX 3-way NDMP Backup fails to start data transfer.
Symptoms
Debug Level 3 Logging
2020-10-07 08:13:21: 30336548864: NDMP: 7: Session 123 (thread ndmp123) < ip=#####, port=0x269c >
2020-10-07 08:13:21: 43221450752: NDMP: 10: connect: check concurrent data streams.
2020-10-07 08:13:21: 26041581568: NDMP: 6: Active NDMP backup/restore streams: 2, system configured concurrent streams: 4, maximum concurrent sessions supported: 8.
2020-10-07 08:13:21: 30336548864: NDMP: 7: Session 123 (thread ndmp123) NdmpdMover::getBuffer() success, buf 0x001ecf2000, CALLER_ADDRESS 0x0001c27253, CALLERCALLER_ADDRESS 0x0001c1e46b
2020-10-07 08:14:33: 13156679680: NDMP: 3: Session 122 (thread ndmp122) NdmpdMover::connect: connect error: Connection timed out
2020-10-07 08:14:33: 26041581568: NDMP: 6: mover connect failed: NDMP_CONNECT_ERR.
Cause
- No Route to Backup Application or Media Server from data mover :
- NIC Teaming for Load Balancing enabled on Backup Application Media Server
- Firewall in Network or Host Blocking incoming data path connection from data mover:
Route
To check the Route from the data mover the Backup Application Media server from the control station as nasadmin
server_ping server_x <media server ip>
A new network or host route is required if there is an issue reaching the Backup Application's media server. See KB article 008516 How to configure routes for a VNX Data Mover?
NIC Teaming
NIC Teaming should be set to Fault Tolerant on the media server. Also referred to as NIC Teaming with active/passive in some documentation.
Firewall
The Backup application specifies an IP and Port Number in the control path, that is used for the Data Connect Phase. This IP and Port Number are the media server that is used to send the backup data to. An outbound TCP Connection is established from the data mover to this IP and Port Number. A firewall in either the network or on the Media Server needs a rule to allow this inbound connection from the data mover. The TCP Port number is chosen by the Backup Application and is usually from a configurable port-range and a random port number in this range is chosen by the Backup Application.
This IP Address and Port number can be seen in a network capture or by Dell Support using debug level three logging on the data mover side. Do not leave Debug level logging enabled after reproducing the issue. To troubleshoot using a network capture a bidirectional network capture should be taken on both the data mover and the Media Server.