Start a Conversation

Unsolved

This post is more than 5 years old

2931

January 19th, 2016 01:00

Scaling Isilon up to 144 nodes

Hi everyone,

I read in Isilon documents that a cluster can scale up to 144 nodes. But there is no document that explains how to do that? Do you know any reference?

Thanks and regards,

Long Chu

450 Posts

January 19th, 2016 11:00

Just keep buying bigger IB switches, or leaf modules for your existing switches, until you have 144 ports and keep buying nodes until all the ports are used.  By the way little tidbit of knowledge; the 144 nodes is simply because that was the largest number of ports you can buy in an Infiniband switch historically, the OS doesn't have any fixed, hard, limit.

If you have a use case for a cluster larger than this, certainly have that discussion with your account team.  Typically what we would see is that a customer might get to 70 nodes with say 3TB drives, and have a need for substantial growth, but instead of buying more nodes with 3TB drives, now they buy them with 6TB drives, and so the growth slows down.  Or we look at the math and figure out that the heat and power draw of the 70 existing nodes would give you a 12 month ROI, on just moving that data to 35 new nodes with 6TB drives, and retire out the old 70 nodes.

Hope this helps add some perspective, but the beauty of all of this is that it's non-disruptive and online.

Chris Klosterman

Advisory Solution Architect; EMC Enablement Team

chris.klosterman@emc.com

twitter: @croaking

1.2K Posts

January 19th, 2016 05:00

It's a highly automated process. In a nutshell, you connect new nodes to the internal Infiniband switches and just let the magic happen. There is more documentation available on racking, cabling and manually confirming a node to join a cluster, but that docs are probably accessible to customers only.

You can play with the FREE Isilon virtual/simulator nodes to get an idea on how to build and extend a cluster with up to six nodes. Step-by-step instructions are included!

https://www.emc.com/products-solutions/trial-software-download/isilon.htm

Have fun

-- Peter

450 Posts

January 19th, 2016 12:00

Ha ha ha... point taken, the actual answer is that you don't smartfail them out until the nodes are empty.  You use a filepool policy to move all the data to the new node pool, and then smartfail the nodes when they are empty, which is much faster.

~Chris

January 19th, 2016 12:00

There are faster ways to fail out nodes.  It used to take me quite a while until my SE told me to create a new storagepool and move the nodes into it and let SmartPools do all the work.  It's a LOT faster

My sweet spot for cluster sizes is under 35 nodes (a 36-port switch with 1 free port).  Preferably keep them under 32 so that you can refresh the cluster without running out of ports.  Assume a 32-node S2x0 cluster, you'll probably have +2 protection so you want to add at least 4 nodes at a time.  Add 4, fail out 4, repeat.  If your nodes have more capacity, you might get away with failing out more than 4 at a time.

I had a cluster with 39 nodes and had the huge IB switches.  They're not worth it IMO - clusters that are too large are too hard to maintain for firmware and OneFS updates.

1 Rookie

 • 

20.4K Posts

January 19th, 2016 12:00

Chris,

i know it's going to be "it depends", but a ballpark number ..what copy rates have you seen in the field when doing filepool moves like that ?

1 Rookie

 • 

20.4K Posts

January 19th, 2016 12:00

i can only imagine how long it would take to smartfail out 70 nodes, took 3 weeks to smartfail 2 x 108NLs

205 Posts

January 19th, 2016 15:00

Typically when we bulk fail out nodes (ie 12-20 at a time) it'll take a few days to run the smartpools job. I'm about to do it again, pulling some 6 NL410-6TB nodes out (total 1PB), so I'll let you know how that looks.

450 Posts

January 22nd, 2016 22:00

Without changing the impact policy or priority of the smartpools job I've seen 100-200TB/week without an issue.  If you want to tune that job up to get it done faster, I have no doubt with the right node types you could beat this by quite a bit.

~Chris

205 Posts

January 23rd, 2016 05:00

I've just had my Smartpools job load 200TB onto my new X410s pulling from NL410-6TB/NL400-4TB pools in around a day and a half. Job is running at Low during the day, Medium at night.

It's pulled around 250TB off of the NL410-6TBs.

That said, it says it's only 20% done.

No Events found!

Top