What type of testing do others perform when implementing new nodes into a cluster? Do others typically add the new nodes, wire up external connections and open the gates?
My tentative plan was setup the external connections to confirm network connectivity and reboot the node(s) before moving data to the them. Reboot is mainly for a bug that we discovered on another cluster but should be fixed now.
for the most part there is no real testing needed, in best case scenarios you are able to plug the node into power (give it at least an hour to charge nvram batteries) then attach network cables and join to cluster.
I am aware of some customers that allow a "burn in" period where they stand new nodes up in a cluster of 1. meaning that they run through the wizard and set up a dummy cluster with no network connectivity. This allows for all hardware to be powered on and disks to spin up. The customers I am aware of doing this generally run it for three days in this configuration.
This allows any hardware failures that may be a result of DOA, heavy handling on shipment, etc. etc. to burn out or fail before adding it to a production cluster and potentially having that impact production.
takes more time to add capacity, RAM, and CPU to cluster. (this method assumes we are not in a rush to add capacity or CPU cycles to existing cluster immediately)
We tend to add a new node(s) yearly, for the most part - we just plug in and go-live.
They pretty much *just* work without any issues, *jinx myself before this years upg*.
Good luck though!
Have you ever had any issues with the nodes when you plug in and go live?
Also, is there a reason you only add them yearly?
I am getting ready to do the same, so any information helps.
Never any problems, just plug and go.
Oh, once we had to quick quirk where one new node needed to be properly formatted rather then the system just automatically being added in.
We only get funding per each year to add in.
So far we've added more NL400 108s then a bank of NL400 144s, this year hopefully a bank of HD400's to the cluster.
They just work....its great!
Plug in and go.
Our cluster is made up of 2 types of S200s, 2 types of S210s, X410s, and NL400s. We have had in this same cluster...
S200s (config 1)
S200s (config 2)
NL400s (config 1)
NL400s (config 2)
S210s (config 1)
NL410s (disastrous, do not use if you want performance and rely on GNA, it don't work)
S210s (config 2)
X410s (pretty damn awesome)
Not all at the same time, of course, but we have typically run between 3 and 7 node types at any given time.
If you currently have no metadata acceleration to SSDs, the HD400s should be just fine for you. If you do, be aware that it doesn't work and will never work with the HD400s and NL410s, as those nodes SSDs are mandatory L3-metadata only and GNA will not be used with them.
Yeah, we found out the really hard way when people started complaining at all levels about everything being ridiculously slow, including tab completes and logins. metadata-only L3 is evil. Needless to say, the NL410s are not in that cluster any more.
We have HD400s in our backup target cluster, and they seem just fine there, albeit pretty darn slow.
We have been told by Isilon that the metadata-only L3 SSDs are there pretty much exclusively to make background jobs not suck so badly.
why were the A100s terrible? we may want to spice up our custers with some of them...
we usually also add the nodes, but do not give them IP configs until disk/node firmwares are updated. If everything that might Impact the user is finished the ports get an IP and are available for the user.
We mostly rely on SMB and have no transparent failover (SMB3) yet. If we achieve this we might be able to just plug and go