Unsolved
This post is more than 5 years old
5 Practitioner
•
274.2K Posts
0
1344
SyncIQ failover and nodes not responding
During a DR test we performed a SynqIQ failover and some or all nodes are not responding. Where should I begin troubleshooting given the Isilon cluster configuration is healthy?
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
1
June 30th, 2013 22:00
Two things to look for:
1. Static Routes
Be sure to understand the VLANs and how clients are making a connection to the Isilon cluster in both Production and DR scenario. I have found several times that what was thought an Isilon issue turned out to be missing routes to other VLANs used by the NFS/SMB clients. For example, we were able to ping the Isilon from a server on the same VLAN. However NFS clients on other VLANs could not ping the Isilon until the routes were added.
2. ARP Cache
When an IP address fails over from one node to another, the interface obtaining the IP address sends a gratuitous Address Resolution Protocol (ARP) reply. The purpose of this packet is to ensure that the other nodes on the network update their ARP cache by associating the failed-over IP address with the new interface.
This packet contains the following information:
Sending the packet to the broadcast MAC address tells all the nodes in the "neighborhood" to update their local ARP cache so that the source IP address can map to the source MAC. This removes the association between the target IP address and the MAC address of the interface that the IP address was failing away from (that was previously in ARP cache).
By default, some switches do not change their ARP cache based on gratuitous ARP replies. Therefore, the ARP cache does not contain the necessary information to associate the failed-over IP address with the new interface. Please confirm that your network infrastructure supports this.
You can use the 'netstat -ar' command while the client is in a stalled state then comparing the MAC address listed to the MAC address on the client.