Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

1344

June 30th, 2013 22:00

SyncIQ failover and nodes not responding

During a DR test we performed a SynqIQ failover and some or all nodes are not responding.  Where should I begin troubleshooting given the Isilon cluster configuration is healthy?

5 Practitioner

 • 

274.2K Posts

June 30th, 2013 22:00

Two things to look for:

1. Static Routes

Be sure to understand the VLANs and how clients are making a connection to the Isilon cluster in both Production and DR scenario.  I have found several times that what was thought an Isilon issue turned out to be missing routes to other VLANs used by the NFS/SMB clients.  For example, we were able to ping the Isilon from a server on the same VLAN.  However NFS clients on other VLANs could not ping the Isilon until the routes were added.

2. ARP Cache

When an IP address fails over from one node to another, the interface obtaining the IP address sends a gratuitous Address Resolution Protocol (ARP) reply. The purpose of this packet is to ensure that the other nodes on the network update their ARP cache by associating the failed-over IP address with the new interface.

This packet contains the following information:

  • Source IP addressthe failover IP address.
  • Source MAC addressthe MAC address for the interface that just obtained the source IP address.
  • Target IP addressthe failover IP address again.
  • Broadcast MAC addressthe destination for the packet.

Sending the packet to the broadcast MAC address tells all the nodes in the "neighborhood" to update their local ARP cache so that the source IP address can map to the source MAC. This removes the association between the target IP address and the MAC address of the interface that the IP address was failing away from (that was previously in ARP cache).

By default, some switches do not change their ARP cache based on gratuitous ARP replies. Therefore, the ARP cache does not contain the necessary information to associate the failed-over IP address with the new interface.  Please confirm that your network infrastructure supports this.

You can use the 'netstat -ar' command while the client is in a stalled state then comparing the MAC address listed to the MAC address on the client.

No Events found!

Top