Dell VNX: Unable to perform file replication operations (user correctable)
摘要: Unable to perform file replication operations between two VNX arrays in a certain direction, replication tasks may stop responding and fail to complete. Replication seems to work in the reverse direction. Data mover replication IPs may be pingable between the arrays, yet replication operations stop responding. The error seen in the source data mover server logs is "connection down with status No Response on node 2." ...
症狀
While attempting to create a new replication session (VDM or File System), you may notice the following messages in the server logs of the affected data mover:
Source VNX:
2023-06-15 14:19:55: REP: 1: 56: TaskID CETV21604000542007_2_27137:0 networkAddress 10.xx.xxx.xxx TargetNetworkAddress 10.xx.xx.xxx connection down with status No Response on node 2. Retry.
Where:
"networkAddress 10.xx.xxx.xxx" = source data mover replication IP
"TargetNetworkAddress 10.xx.xx.xxx" = destination data mover replication IP
Destination VNX:
2023-06-15 14:19:46: HTTP_CLIENT: 3: Connection failed to xx.xx.128.10 address through interface 10.xx.xx.xxx error: 'Connection timed out ' 2023-06-15 14:19:46: HTTP_CLIENT: 3: The IP address xx.xx.128.10 is unavailable
(note: IPs have been masked with 'x')
You may encounter this issue when performing various replication operations including the following:
- Create new VDM/File System replication
- Reverse
- Switchover
- Failover
Checking nas_task -list, shows that the replication task was created but stops responding and fails to complete.
原因
From the destination VNX data mover server logs, we see the following:
2023-06-15 14:19:46: HTTP_CLIENT: 3: Connection failed to xx.xx.128.10 address through interface 10.xx.xx.xxx error: 'Connection timed out ' 2023-06-15 14:19:46: HTTP_CLIENT: 3: The IP address xx.xx.128.10 is unavailable
An existing NAT (Network Address Translation) rule was found to be configured for the destination VNX. This rule mapped the destination data mover replication IP into a different NAT address as IP in "Connection failed to xx.xx.128.10" above.
Replication ports were open specifically for the pair of the source and target replication IPs, not the NAT IP.
As a result, the source/destination data movers could not establish a replication session or perform replication operations in a certain direction.
As per page 15 of the VNX Replicator white paper, "VNX Replicator is not supported with Network Address Translation (NAT)".
解析度
The NAT rule must be disabled for both source/destination VNX arrays, to allow replication traffic between the source and destination data movers.