IP addresses not accessible after SmartConnect failover

Question

We have four X200 nodes in our cluster running 6.5.5.11. I had to temporarily disconnect the 10Gbit interfaces on nodes #2 and #4 yesterday. I monitored our SmartConnect pools during the maintenance, and the IP addresses on those interfaces dynamically failed over to the other two nodes correctly. Once the interfaces were reconnected, I ran a manual rebalance of addresses on the pool, and the IPs rebalanced correctly.

However, I then noticed we could no longer reach the IP addresses on three of the four nodes. Currently, only IPs on node 1 can be reached. Nodes 2, 3, and 4 are all unreachable on the 10Gb interfaces. The links are green, and the IPs are assigned to them, but they are unreachable.

To troubleshoot, I created a new test pool, and moved the interfaces that weren't working into that pool with a different IP range. The IPs were successfully assigned to the interfaces, however they are still unreachable on the network.

I verified the ports on the switch are configured correctly, and verified from the switch logs that no changes occurred except that the port went down and came back up during the maintenance.

I also rebooted a node and then tested connectivity, but its still unreachable.

Node 1 is the only one working, it is also the same node that hosts the SmartConnect service IP. I'm not sure if that has anything to do with it though. This really feels as if the cluster doesn't know which IPs are on which interfaces, even though from the GUI it all looks right.

Can anyone suggest any CLI troubleshooting options? I have a ticket open but its been 12 hours and no response yet, so I'm looking for some troubleshooting ideas in the meantime. Thanks!

timgone1 · Answer

Yea, I tried that, but still no connectivity. I even created a separate test pool and moved the problematic interfaces there, but still no connectivity.

christopher_ime · Answer

I would try removing the interface from the pool and add back.  If it works, let's get Isilon support to debug (or possibly it is an issue in earlier code).

Peter_Sero · Answer

Has the issue been solved?

If not, here are some suggestions for the CLI:

# isi networks ls ifaces

Login to the unresponsive node from a responsive one, then

# ifconfig

-> the output should be consistent with the above 'isi networks'.

On the unresponsive node, can you resolve a client's hostname?

# host 'clienthostname'

reach the client? (use client's IP address instead of its hostname if the above resolving has failed):

# ping 'clienthostname'

# traceroute 'clienthostname'

Peter

cincystorage · Answer

You can try to down the interfaces from the local nodes in questions - i've seen something similar work before.

timgone1 · Answer

Is there a way to down the interfaces in the CLI? I did it from the switch side, but the issue remained.

timgone1 · Answer

Yea, when I did this things did not look right. There were pool IPs still assigned to nodes that I had removed from the pool to troubleshoot.

I'll be calling in today to follow up on the ticket as I missed their callback on Friday.

Peter_Sero · Answer

Yes, the classical UNIX way of ifconfig 'interfacename' down        (or up resp.) is available on the CLI, but no guarantee that it won't worsen the mess here... Peter

pb_07 · Answer

Hi Guys,

Is the above issue resolved, i m also facing a similar crisis in a new cluster. Any assistance would be great. Its is a new cluster only node 1 & 2 external ips are ping-able, rest 3 nodes are non responsive to pings. ICMP is not disabled, rebooted the nodes.Still no luck. All teh interfaces are up & ip assigned properly.

Isilon

IP addresses not accessible after SmartConnect failover

Was this post helpful?