Highlighted
hersh1
2 Iron

Testing new node types in existing cluster

What type of testing do others perform when implementing new nodes into a cluster? Do others typically add the new nodes, wire up external connections and open the gates?

My tentative plan was setup the external connections to confirm network connectivity and reboot the node(s) before moving data to the them. Reboot is mainly for a bug that we discovered on another cluster but should be fixed now.

Labels (1)
0 Kudos
24 Replies

Re: Testing new node types in existing cluster

Hello Hershal,

for the most part there is no real testing needed, in best case scenarios you are able to plug the node into power (give it at least an hour to charge nvram batteries) then attach network cables and join to cluster.

I am aware of some customers that allow a "burn in" period where they stand new nodes up in a cluster of 1. meaning that they run through the wizard and set up a dummy cluster with no network connectivity. This allows for all hardware to be powered on and disks to spin up. The customers I am aware of doing this generally run it for three days in this configuration.

advantage:

This allows any hardware failures that may be a result of DOA, heavy handling on shipment, etc. etc. to burn out or fail before adding it to a production cluster and potentially having that impact production.

pitfall:

takes more time to add capacity, RAM, and CPU to cluster. (this method assumes we are not in a rush to add capacity or CPU cycles to existing cluster immediately)

0 Kudos
LeoWski
1 Nickel

Re: Testing new node types in existing cluster

We tend to add a new node(s) yearly, for the most part - we just plug in and go-live.

They pretty much *just* work without any issues, *jinx myself before this years upg*.

Good luck though!

_LEO_

0 Kudos
docmikewoods
1 Copper

Re: Testing new node types in existing cluster

Have you ever had any issues with the nodes when you plug in and go live?

Also, is there a reason you only add them yearly?

I am getting ready to do the same, so any information helps.

Thanks

-MW

0 Kudos
LeoWski
1 Nickel

Re: Testing new node types in existing cluster

Never any problems, just plug and go.

Oh, once we had to quick quirk where one new node needed to be properly formatted rather then the system just automatically being added in.

We only get funding per each year to add in.

So far we've added more NL400 108s then a bank of NL400 144s, this year hopefully a bank of HD400's to the cluster.

They just work....its great!

0 Kudos
carlilek
2 Iron

Re: Testing new node types in existing cluster

Plug in and go.

Our cluster is made up of 2 types of S200s, 2 types of S210s, X410s, and NL400s. We have had in this same cluster...

10000X-SSDs

72NLs

S200s (config 1)

X200s

S200s (config 2)

NL400s (config 1)

A100s (terrible)

NL400s (config 2)

S210s (config 1)

NL410s (disastrous, do not use if you want performance and rely on GNA, it don't work)

S210s (config 2)

X410s (pretty damn awesome)

Not all at the same time, of course, but we have typically run between 3 and 7 node types at any given time.

0 Kudos
carlilek
2 Iron

Re: Testing new node types in existing cluster

If you currently have no metadata acceleration to SSDs, the HD400s should be just fine for you. If you do, be aware that it doesn't work and will never work with the HD400s and NL410s, as those nodes SSDs are mandatory L3-metadata only and GNA will not be used with them.

0 Kudos
LeoWski
1 Nickel

Re: Testing new node types in existing cluster

Ahh - ok, I forgot that we've also got 3x S200's for acceleration too.

0 Kudos
carlilek
2 Iron

Re: Testing new node types in existing cluster

Yeah, we found out the really hard way when people started complaining at all levels about everything being ridiculously slow, including tab completes and logins. metadata-only L3 is evil. Needless to say, the NL410s are not in that cluster any more.

We have HD400s in our backup target cluster, and they seem just fine there, albeit pretty darn slow.

We have been told by Isilon that the metadata-only L3 SSDs are there pretty much exclusively to make background jobs not suck so badly.

0 Kudos
sluetze
3 Silver

Re: Testing new node types in existing cluster

carlilek,

why were the A100s terrible? we may want to spice up our custers with some of them...

we usually also add the nodes, but do not give them IP configs until disk/node firmwares are updated. If everything that might Impact the user is finished the ports get an IP and are available for the user.

We mostly rely on SMB and have no transparent failover (SMB3) yet. If we achieve this we might be able to just plug and go

0 Kudos