PowerScale: Clients disconnect after IP address moves when using Cisco ACI network
Summary: This article describes when Flooding Mode and Gratuitous Address Resolution Protocol (GARP) based detection are not enabled. Plus, data-plane learning is not disabled and GARPs are not propagated properly. The results are client disconnects after a node IP movement. This issue covers Cisco Application Centric Infrastructure (ACI). ...
Symptoms
Cause
A PowerScale node sends a GARP in order to update the switch and all other devices on the network of the change. The GARP allows these devices to update their Address Resolution Protocol (ARP) tables with the new IP to Media Access Control (MAC) address. If all the following conditions are met, the GARPs do not pass through the switch.
- Flooding mode is disabled.
- GARP-based detection is disabled.
- Rogue-IP detection is disabled.
- Data-plane learning is enabled.
No devices are aware of the change in MAC address relating to the IP that moves. Then all packets to that IP address are sent to the wrong device and ultimately dropped. The result is that clients that were communicating with that IP address no longer have connectivity to the node. Any sessions to that IP address are disconnected.
NOTE: PowerScale requires that standard Layer 2 switching functionality is enabled in order to operate properly with a switch.
Resolution
There are three major components to this change:
- Enable - ARP Flooding
- Enable - GARP-based Detection
- Disable - Rogue-IP Detection
- Disable - Data-Plane Learning
All four components must be changed for this to work properly.
Additional Information
Floating IP Address Considerations
In some deployments, an IP address may exist on multiple servers and, as a result, be associated with multiple MAC addresses. For example:
If clustering, an IP address may move from one server to another, thus changing the MAC address and announcing the new mapping with a GARP request. This notification is received by all hosts that had the IP request cached in their ARP tables.
Cisco ACI learns the MAC and IP address association for a leaf node, bridge domain, and port number based on data-plane or control-plane traffic.
An IP address that was originally associated with a MAC address say MAC 1 moves to a different server and is now associated with MAC 2. The server sends a GARP request to update the ARP caches of the other servers in the same bridge domain. For this scenario to work, ARP flooding must be enabled in the bridge domain. Furthermore, the host where the IP address moved from must reset existing connections. Thus, it generates a TCP RST request to the clients connected to it. This TCP RST request may cause the mapping database to point to the host where the IP was previously located. For the mapping database to point to the correct host and port, endpoint IP data-plane learning should be disabled, and only control-plane learning should be used.
Questions should be directed to Cisco on a proper network configuration for the network to support our cluster requirements.