Start a Conversation

Unsolved

This post is more than 5 years old

2105

August 30th, 2013 16:00

What SmartConnect connection policy should I use?


The responses are generalizations, but should be good reference points.

1) Connection Count : Allocates connections by counting TCP sessions. This may also include management and other types of TCP based connections.

Connection Count is useful when the mounts are fairly persistent and the workload is similar between connections. If a client needs to be unmounted for some reason, it will be remounted to a node with the least amount of connections and continue to get the same level of service as before.

2)  CPU Utilization : Allocates connections by assessing the CPU load of each node.

CPU Utilization is useful when there are lots of connections that are short lived and transfer a relatively small amounts of data, after which the connection is torn down. Establishing and tearing down the TCP socket requires CPU resources.  As many hosts attempt to connect to the Isilon, they may experience latency or inconsistent responses when some cluster CPU's are at high utilization, and others are not. By directing clients to the node with the lowest CPU utilization at that time, the clients will experience consistent response times and CPU utilization over the entire cluster should be more even.

3) Network Throughput: Allocates connections based on the network throughput of each node.

Network Throughput is useful when the mount pattern is dynamic, the IO is sequential, and the application is throughput centric and perhaps bursty. When the application or user makes a request for data and requires high bandwidth, you want to direct them to the node that can provide the most bandwidth, but also prevent that new connection from interfering with an existing connection with similar requirements. The reason it is helpful that the mount pattern is dynamic, is to avoid having a connection made to a node that has low network throughput at the time, only to find that an idle connection has woken up, and started making aggressive use of bandwidth.  

4) Round Robin: Allocates connections by order of IP, regardless of other activities on the cluster.

Round Robin is useful in many cases for general purpose access. It is particularly useful in some HPC like applications when many clients mount at almost the same time.

Why wouldn't one of the above, more granular metrics be more appropriate?  It is because the above policies require some amount of time to assess the cluster condition in order to choose the appropriate IP. During this time, the same IP will be provided, even while the conditions are changing. In a batch mount environment this will cause many clients to be directed to some nodes but not others.

Round Robin does not need to assess the cluster condition and can issue unique IPs in rapid succession ensuring an even distribution of connections.  

1.2K Posts

September 2nd, 2013 00:00

Great background and advice!

We are using Round Robin for exactly that reason (batch behaviour),

but it would be cool to dynamically EXclude some top-XX%-loaded nodes,

based on CPU and/or throughput. Would that make sense to you?

-- Peter

1 Message

September 4th, 2013 19:00

Nice article Colby.  The point about round robin is very important in batch mount environments. 

John

PS: We miss you as our SE

3 Apprentice

 • 

592 Posts

September 30th, 2013 23:00

Is there a doc that states this so I can give to customer?

1.2K Posts

October 1st, 2013 10:00

It seems the SmartConnect WP Dec 2011 unfortunately does not explicitly cover this point

http://www.emc.com/collateral/hardware/white-papers/h8316-wp-smartconnect.pdf

However this more recent WP

http://www.emc.com/collateral/white-papers/h11909-emc-isilon-best-practices-eda-wp.pdf

has the very nice Table 4... which summarizes the recommendations in general,

and mainly independent of the actual industry (EDA in this WP). I'll try to copy-paste:

Table 4: Example usage scenarios and recommended balancing options

   

Large number of persistent NFS & SMB connections

  

Large

 

NFS automount or UNC paths are used

Load- balancing policy

number of transitory connections

Usage patterns

Heavy activity on a few clients

unknown

 

(HTTP, FTP)

 
 

Round robin

    

X

 
   

X

 
  

X

 

X

     

X

  
 

CPU usage

        

X

 
   

X

     

Network throughput

X

X

 

Connection count

      

X

         

X

       

Strong points for round-robin, including for use with automount.

-- Peter

2 Posts

October 23rd, 2013 10:00

We have test the NFS connection but at the end of the day it gives only 1 IP address that it keep until you remount, so the CPU and Throughput it doesn't help really.

When does the Isilon roadmap plan to deploy pNFS for better performance used?

4 Posts

October 25th, 2013 17:00

This is typical of a few connection, although once you get a lot more connections you will see the policies work they way they are supposed to

No Events found!

Top