Start a Conversation

Unsolved

This post is more than 5 years old

539

August 12th, 2014 05:00

Any benefits to isolating Isilon s200 nodes from front end network

My customer has a 9 node cluster that is expanding to 19 nodes.  5 of the new nodes are S200s and they are considering not putting them on the network in order to leverage all the I/Os for SSD GNA.  Is this necessary?  Are they gaining anything from this?

Gabe

1.2K Posts

August 12th, 2014 06:00

One simple rule is not to extend a SmartConnect zone across different

node types, for reasons of consistent performance.

So If there are no plans for a dedicated SmartConnect zone on the S200 nodes,

just  provide the external network for AD/LDAP/DNS and WebUI etc.

With a possible use case for an extra S200 SmartConnect zone,

it could be considered. Just make sure the CPU load on S200 nodes doesn't max out;

and for other "I/O" resources under consideration:

- the external network and the HDDs I/O don't affect the GNA per se

- the internal IB network should have plenty of capacity (bandwidth at low latency)

99 Posts

August 13th, 2014 02:00

Interesting question.  Ironicially, not only is the answer 'no', but the premise of the question is upside down.  The S200 nodes (now the S210 as well) are the most efficient nodes at handling protocol ops.  If you read the recent SpecSFS findings for the S210, for example, you will see the effect.

If you must isolate nodes from the network - never a good idea in general - t would be better to isolate the NL nodes (for example) from the network and let the S nodes handle the protocol ops, since their core-to-drive ratio is much better than the NLs.  For example, the S200 has 8 cores and 24 drives; the NL400 has 8 cores and 36 drives.  The S nodes have more 'headroom' to service the ethernet ports.  As Peter said, monitor your CPU with InsightIQ and the web interface, as is good practice.

It is also good practice to construct SmartConnect zones to allow clients with high ops requirements to connect to the S nodes while clients with low ops requirements connect to other nodes.

August 13th, 2014 06:00

> the S200 has 8 cores and 24 drives; the NL400 has 8 cores and 36 drives

It's even more lop-sided than that.  The S200 has 2.4GHz CPUs - the NL400 are only 1.6GHz.

You didn't say what the remaining nodes in the cluster are and it makes a difference if they're NL or X nodes, and also how they're used.  It also makes a difference how much memory you have on your nodes.

There are couple viable solutions:

1.  Put ALL of the traffic through the S200 nodes and don't have a SmartConnect zone on the other nodes at all.  In many cases, the user can get I/O serviced faster by going through the S200, having it suck the data from an NL node, and back to the user than if you were pointing the user directly at an NL node.  This is simply because of the huge CPU and cache differences between S and NL nodes.  An S200 node with 96GB of RAM will have a major cache benefit over an NL that might only have 12 or 24GB of memory.

2.  Create a separate SmartConnect zone for each tier - one for your S200 nodes and one for your NL nodes.  Have the users mount the data from the appropriate tier.  This could be used if you can't afford to slow down your S200 traffic while waiting for an NL node to respond and your S200 CPUs are being saturated under load.  Under very heavy NL load, you could potentially impact the latencies of your S200 traffic if you have your SmartConnect zone exclusively on your S nodes.  I'm in this situation now with short-term peaks of  over 100K NFS Ops trying to being serviced by only 3 NL nodes. 

NL nodes are NOT performance nodes.  You won't even see them on performance comparison slides because they're so slow. They're great archive nodes as SyncIQ replication targets or other long-term archives.   For user traffic, not so much.  Good, fast, cheap - pick at most two.

1.2K Posts

August 13th, 2014 07:00

> 1.  Put ALL of the traffic through the S200 nodes and don't have a SmartConnect zone on the other nodes at all.  In many cases, the user can get I/O serviced faster by going through the S200, having it suck the data from an NL node, and back to the user than if you were pointing the user directly at an NL node.  This is simply because of the huge CPU and cache differences between S and NL nodes.  An S200 node with 96GB of RAM will have a major cache benefit over an NL that might only have 12 or 24GB of memory.


CPU-wise not necessarily in this case, 5 S200 nodes facing 14 X/NL? nodes...

More importantly, while some RAM in the S200 will act as L1 cache when sucking blocks from the X/NL nodes,

that size will be pretty small. Because the storage nodes (unlike the backup accelerator!) use only a

tiny fraction of memory for L1, due to "aggressive drop-behind" policy.


> 2.  Create a separate SmartConnect zone for each tier - one for your S200 nodes and one for your NL nodes.  Have the users mount the data from the appropriate tier.


Yes, ideally some rationale for creating multiple zones can be found. Either by data (each share placed on one node type only) or by client (while all data is auto-tiered across different nodes types). Or any mixture thereof, if you like it messy


2 Intern

 • 

20.4K Posts

August 13th, 2014 07:00

different strokes for different folks. We run all of our PeopleSoft reporting, student collaboration, cascade websites, traditional office shares ..you name it on 8 x 108NL.  Users are happy, no complains.

No Events found!

Top