Start a Conversation

Unsolved

This post is more than 5 years old

3990

August 27th, 2017 11:00

Isilon Infiniband inter-switch cabling

Hello all Community Members:

Please, could you help me with my following question:

Customer, already has a Isilon Cluster formed by:

8 Nodes X400 and 24 Nodes NL-400. All them, are connected using two Infiniband switches 36 Ports.

This cluster will be replaced for a new one. The new will use Nodes Gen6. We´re thinking do a non-disruptive migration. I mean, include the new Nodes Gen6 in the current Cluster and migrate date from old Nodes to new ones using SmartPool Policies.  With the new Nodes Gen6, we're including new Infiniband switches.

Here my question:

Could we interconnect Old Infiniband switches wit the new Infiniband switches to do all the refresh technology process. Could be supported temporary ?..

The idea, is connect one of the old Infiniband switches to new one. and so, for the other old infiniband switch temporary. When all the new nodes are in the Cluster and the old nodes are free, remove these inter-switch connections.  We´re already warned about upgrade OneFS to version 8.1 or above.

Thank you in advance for your answer.

1.2K Posts

August 28th, 2017 02:00

Ask for leasing a switch (or two identical switches) large enough to

fit old and new nodes at the same time.

450 Posts

August 28th, 2017 06:00

Peter is correct, the right way to do this is:

1. Get 2 new IB switches that are large enough to accommodate all the new nodes + all the old nodes.

2. Get enough IB cables (sometimes on loan) to connect from the old nodes to the new switches (these are only temporary), but make sure you measure the length before you order these. 

3. Swap out one side of the IB fabric at a time to the new switches. (following the Procedure Generator's procedure)

4. Now plug in the cables from the new nodes to those same switches.

5. Remove any SmartConnect Rules that exist on the cluster, because you want to handle the next few steps correctly.

6. Add all the new nodes to the existing cluster (the old nodes may need to be upgraded to OneFS 8.1.1 or later (a guess, but check the compatibility guide to be sure).  https://www.emc.com/collateral/TechnicalDocument/docu44518.pdf

7. Change the default file pool policy to write all new data to the new node pools.

8. Now make sure each SmartConnect Pool has enough IP addresses to add in the new nodes properly.

9. Manually add the network interfaces from the new nodes to the existing smart connect pools.

10. Find out which node has the lowest NodeID (not LNN).  to see the difference, use isi_config -> status advanced to see a table of the mappings.  This is necessary because the SmartConnect Service IP will live on the node with the lowest NodeID by default, not the lowest LNN.

11.  Suspend the old nodes, from the SmartConnect Pools (this won't affect any existing clients).  Suspending the nodes, just means stop handing out client connections to these nodes, instead only give out new connections to the new nodes.  You can let it run like this for weeks or months, to let those old connections age off, and trickle over to minimize any client impacts.

12. Create a File Pool Policy to move the data over (if necessary), or update any existing policies to point to the new node pools or capacity tiers.

13. Sit back and watch.

14. Once the number of connections to the old nodes drop down to zero, or near-zero, plan a time to remove those interfaces from the SmartConnect pools, and then do so, according to your plan.

15. Once the old node pools empty out remove the old nodes from the cluster, 1 at a time.  Make sure to remove the node you identified in step #10 above last.  The reason for this is that when the SmartConnect Service IP moves over, it causes a group merge event, which could delay new connection requests briefly.  Usually the larger the cluster, the longer this takes.  That said I'm sure this is far less impactful than it used to be.

16. Now that all the old nodes are removed from the cluster (in software), power them off, and then remove from the racks and the datacenter when appropriate.

Anyway hope this helps, there should be in the Isilon Procedure Generator, a better procedure than mine, but this should cover the high-level plan.

Chris Klosterman

Principal Pre-Sales Consultant, Datadobi

chris.klosterman@datadobi.com

No Events found!

Top