Unsolved
This post is more than 5 years old
10 Posts
0
3447
May 3rd, 2013 07:00
IP addresses not accessible after SmartConnect failover
We have four X200 nodes in our cluster running 6.5.5.11. I had to temporarily disconnect the 10Gbit interfaces on nodes #2 and #4 yesterday. I monitored our SmartConnect pools during the maintenance, and the IP addresses on those interfaces dynamically failed over to the other two nodes correctly. Once the interfaces were reconnected, I ran a manual rebalance of addresses on the pool, and the IPs rebalanced correctly.
However, I then noticed we could no longer reach the IP addresses on three of the four nodes. Currently, only IPs on node 1 can be reached. Nodes 2, 3, and 4 are all unreachable on the 10Gb interfaces. The links are green, and the IPs are assigned to them, but they are unreachable.
To troubleshoot, I created a new test pool, and moved the interfaces that weren't working into that pool with a different IP range. The IPs were successfully assigned to the interfaces, however they are still unreachable on the network.
I verified the ports on the switch are configured correctly, and verified from the switch logs that no changes occurred except that the port went down and came back up during the maintenance.
I also rebooted a node and then tested connectivity, but its still unreachable.
Node 1 is the only one working, it is also the same node that hosts the SmartConnect service IP. I'm not sure if that has anything to do with it though. This really feels as if the cluster doesn't know which IPs are on which interfaces, even though from the GUI it all looks right.
Can anyone suggest any CLI troubleshooting options? I have a ticket open but its been 12 hours and no response yet, so I'm looking for some troubleshooting ideas in the meantime. Thanks!
0 events found


timgone1
10 Posts
0
May 3rd, 2013 09:00
Yea, I tried that, but still no connectivity. I even created a separate test pool and moved the problematic interfaces there, but still no connectivity.
christopher_ime
6 Operator
•
2K Posts
1
May 3rd, 2013 09:00
I would try removing the interface from the pool and add back. If it works, let's get Isilon support to debug (or possibly it is an issue in earlier code).
Peter_Sero
6 Operator
•
1.2K Posts
1
May 5th, 2013 23:00
Has the issue been solved?
If not, here are some suggestions for the CLI:
# isi networks ls ifaces
Login to the unresponsive node from a responsive one, then
# ifconfig
-> the output should be consistent with the above 'isi networks'.
On the unresponsive node, can you resolve a client's hostname?
# host 'clienthostname'
reach the client? (use client's IP address instead of its hostname if the above resolving has failed):
# ping 'clienthostname'
# traceroute 'clienthostname'
Peter
cincystorage
2 Intern
•
467 Posts
1
May 7th, 2013 08:00
You can try to down the interfaces from the local nodes in questions - i've seen something similar work before.
timgone1
10 Posts
0
May 7th, 2013 09:00
Is there a way to down the interfaces in the CLI? I did it from the switch side, but the issue remained.
timgone1
10 Posts
0
May 7th, 2013 09:00
Yea, when I did this things did not look right. There were pool IPs still assigned to nodes that I had removed from the pool to troubleshoot.
I'll be calling in today to follow up on the ticket as I missed their callback on Friday.
Peter_Sero
6 Operator
•
1.2K Posts
0
May 7th, 2013 09:00
Yes, the classical UNIX way of
ifconfig 'interfacename' down (or up resp.)
is available on the CLI, but no guarantee that it won't worsen the mess here...
Peter
pb_07
1 Message
0
September 14th, 2016 21:00
Hi Guys,
Is the above issue resolved, i m also facing a similar crisis in a new cluster. Any assistance would be great. Its is a new cluster only node 1 & 2 external ips are ping-able, rest 3 nodes are non responsive to pings. ICMP is not disabled, rebooted the nodes.Still no luck. All teh interfaces are up & ip assigned properly.