Start a Conversation

Unsolved

This post is more than 5 years old

25249

July 4th, 2013 16:00

ESXi and MEM - Question on Utilization

I am hoping some experienced engineers can help me out. We have been having internal discussions on reducing the number of connections we have configured with MEM for our ESX infrastructure.

We are currently running 10 ESXi 5.1 servers, six Equallogic members in two pools and supporting shy of 700 vm's in this environment. We are bumping up against the 1024 connection limit. We have already increased the size of half our datastores and reduced the number of them. There is internal debate on the actual performance contributions of the MEM plugin and what it brings to the table. I know what it does, but I am looking for some tests and ways to validate the performance improvements it brings. Whether by way of monitoring datastore latency, throughput, MB/s transfer speeds, network utilization on the vmfs iSCSI interfaces, whatever. Where should I be looking at what metrics should I be analyzing that would show the benefits. I need real evidence.

This would help confirm the value and allow us to gauge the impact of reducing the number of connection counts.

7 Technologist

 • 

729 Posts

July 5th, 2013 08:00

One of the simplest ways is to look as SANHQ "Live View" and look for any latency increase as you lower/adjust the MEM the connection counts.  This is under the IO section ( you need to configure the login and setting to use it first).

Regarding the connection count, MEM has ways to bring down the number of connections by modifying the three MEM connection limit parameters on each ESX server running MEM.

totalsessions: The ESX server running MEM cannot create more connections than this unless it requires more connections to create at least one connection to each volume.   Max of 1024 and minimum of 64, default is 512.

volumesessions:> The ESX server running MEM cannot create more connections to any one volume than this.  Max of 12 and a minimum of 3,  the default is 6. 

membersessions;  The ESX server running MEM cannot create any more volume connections per member to any one volume than this.  Max of 4 and minimum of 1, default is 2. 

If the pool limit just needs to be reduced a bit then volumesessions and membersessions is the best choice.  

If the pool needs to be reduced substantially the totalsessions may be the best short term choice.

There are two ways to change the MEM configuration values.  The preferred method is to use the MEM setup.pl script to modify the values and within 4 minutes the connections will adjust themselves.  You can also edit the /etc/cim/dell/ehcmd.conf script directly on the ESX server but you will then need to do the following to make the MEM re-read the configuration file. 

MEM 1.0.0 you just need to restart the ehcmd service with the command
service ehcmd restart

MEM 1.0.1 does not have an ehcmd service so you will need to restart all the CIM providers using this command to make MEM re-read the ehcmd.conf file.    
/etc/init.d/sfcbd-watchdog restart

 

-joe

5 Practitioner

 • 

274.2K Posts

July 5th, 2013 10:00

Hello,

To add some text to this.  Without MEM, a multimember pool requires an additional connection to every member in that pool having data for a volume.  So a two member pool, MemA & MemB.  ESX connects to MemA.  A hidden connection is then openend to MemB. An I/O request for data residing  on MemB, is shipped to MemA over that hidden connection.  Then MemA sends the data to the server.  

The reverse is true, when ESX connects to MemB, MemA will ship data to MemB to reach the ESX server connected to MemA.   This has been how EQL scales members since day one.  At some point, this additional overhead reduces efficiency as you add more members to the pool.  

Enter in MEM (or HIT/ME, HIT/LE).   Now each NIC ESX is using for iSCSI makes its own connection to every member in that pool holding data for that volume.  Removing the need to use that hidden connection.  Now I/O requests go directly to the member with the data.  That's both for Writes and Reads.  MEM knows what blocks are accesible via what paths.  This is a great improvement over standard behavior.

If you are running EQL FW 6.0.x, with MEM 1.1.1+ they will negociate connection counts automatically.  I.e. if you are using 4x NICs for iSCSI, it won't use all paths to every volume by default.  Extra connections that don't prove beneficial are dropped.  It can do this by looking at current data rates over the paths.

 iSCSI is usually not network bandwidth bound. It's rare that a server is need hundreds of MB/s of sustained throughput.  Especially ESX.  IO rate and llow atency are more ciritical.  MEM is very effective in these cases.

The documentation with MEM covers this in more detail.

 As Joe mentioned SANHQ is helpful here.  

If you switched to VMware's Round Robin, the default IOs per path is 1000.  This has the effect of only using one path most of the time.  If you do, make sure you change the IOs per path from 1000 to 3.  

You can also open a support case.   They can review the diags, SANHQ, and VMware support files to get a better picture of what's happening and make sure the ESX servers are set to best practices.  

E.g.:

1.) Delayed ACK disable and in effect

2.) Large Recieve Offload disabled.

3.) Login_timeout set to 60 seconds (default 5)

4.)  Make sure VMDKs and Raw Device Mapped LUNs have their own Virtual SCSI adapter.

5.)  Current build of ESXi.  

The VM supports also provide VMFS datastore IO data.  So you can get an idea if there are too many busy VMs on a particular Datastore.

102 Posts

July 6th, 2013 08:00

Thanks guys. So is it in fact considered a best practice to disable Delayed ACK and Large Receive Offload in an ESX/Equallogic environment? I know it is a suggested a possible solution to performance problems but my understanding is that it was meant to be applied only if particular issues were occurring. If there are no adverse affects to making these configurations changes, then it may be worth doing so just to rule this out.

5 Practitioner

 • 

274.2K Posts

July 6th, 2013 11:00

Yes, these are what we suggest all EQL customers running ESX 4.x->5.x do.  These changes won't change how MEM works or iSCSI connection counts.

No Events found!

Top