Unsolved
This post is more than 5 years old
2 Posts
0
2855
May 11th, 2016 05:00
Smartfailing Node1 of a Cluster
We have older IQ Nodes in our Isilon cluster that we are currently SmartFailing out. We are down to our last IQ Node and it is the first node of the cluster or Node 1. Is there anything special I need to do outside of running SmartFail on it when removing it from the cluster? I know that Node 1 has special roles assigned to it from reading about SmartFailing but I can't find any articles that mentions anything specific about removal of Node 1.
We have plenty of spare capacity for the SmartFail but I just want to make sure I'm not missing anything before I do this.
Thanks.
No Events found!



crklosterman
450 Posts
1
May 11th, 2016 08:00
shawns
Personally I would do it in 3 steps:
1. Suspend that node from any smart connect pools, this doesn't disrupt any existing client connections, it just stops smart connect from handing out this node's IPs for new connections. This is CLI only, look in the CLI manual for your version for the syntax.
2. Once you are comfortable that the number of users connected to the node is at an absolute minimum (use insightIQ to track this), remove the network interfaces from the smart connect pools. This will cause a group merge, which is more or less an election where the smart connect service will decide where to move to. The answer is that it usually moves to the node with the lowest NodeID, not the lowest LNN. If you're curious about the difference (usually there is none), but do an isi config --> status advanced , and look for a table at the bottom of the output. Anyway during the group merge new nslookups to the cluster will have to wait a little bit. This has no effect at all on existing clients, just on clients trying to establish a new connection in the few seconds after you remove the interfaces, and if they try again it'll work just fine. So the smart connect service will then be running on your new lowest node, the SSIP (smart connect service IP will move over, and send gratiouitis ARPs to the switch), and then you're golden again.
3. Now smart fail out that last node. Odds are that if you used smartpools to empty out the node pool first, the node is mostly empty anyway, so it should be relatively quick, but it really doesn't matter much how long it takes because that has zero impact on the rest of the cluster.
~Chris
Anonymous User
170 Posts
0
May 12th, 2016 06:00
There's really nothing special about node 1. The lowest number node runs some commands, and that's usually node 1, but it doesn't have to be node 1. For example, your SmartConnect DNS server runs on node 1. If you lose node 1, or its network interface (eg, ifconfig bxe0 down), the service immediately moves to the lowest numbered node that is available, which is likely node 2. Some of the cron jobs use isi_ropc - this tells cron to Run Once Per Cluster - usually the lowest numbered node.
For faster SmartFails, I typically create a new storage pool and then use SmartPools to move the data. That's much faster, with less impact on cluster operations, than running FlexProtect. When the data is mostly moved by SmartPools, I then use SmartFail to finish the job off. While SmartPools is moving the majority of the data, other jobs can still run whereas FlexProtect blocks everything else (like SnapshotDelete, etc).