canada-it
2 Iron

Achieving Best Performance Possible

Looking for some opinions from experienced equallogic owners. Our reworked corporate environment consists of the following: 2 x ps6000xv, 3 x ps6500x arrays and 11 x r710/r720 with 288gb and 384gb respectively, each with 8 – 16 1gbps interfaces. Our switches are 4 x 3750’s connected via backend 36gbps stacking cable.

Traditionally we have been running 2 x nics for service console and vm network, 1 x nic for vmotion, 2 x nics for vmfs and 2 x for guest iscsi. We have also experimented with consolidating iscsi and vmfs on the same two nics with little noticeable impact.

Since our machines are taking on more load I wanted to rule out a possibility of bottlenecks at the guest iscsi and vmfs levels so I decided to go with 2 x vmfs and 2 x iscsi interfaces. Here is my question: What is the advantage of using four nics in a single vswitch handling both vmfs (SAN) and guest iscsi with the equallogic mem, versus a vswitch with 2 x vmfs and a vswitch with 2 x iscsi, also using the equallogic mem.

Second question, because 11 hosts each with four connections to the SAN plus the default of 6 connections per interface eats up the connection count quickly, I wanted to drop the SessionCounts to 2 or 3. What kind of impact is that going to have with four adapters versus 2 adapters with 6 connections per interface in its default configuration of the MEM?

I am looking for the best performance obviously, and thought that pooling four nics together since each array controller has 4 interfaces, might improve the performance through appropriate load and session balancing. But maybe the performance will not justify the increase in connections?

Anyone care to share some opinions?

0 Kudos
9 Replies

Re: Achieving Best Performance Possible

Re: 1 vSwitch vs 2. The only advantage would be a small memory overhead savings on the ESX server.

What model 3750?  Sounds like a "G" model.   Are these dedicated to iSCSI or shared with other duties?

Ideally, the SAN should have its own switches.

Only effective way to troubleshoot your performance issue is by opening a support case.   Getting all member diags, switch logs, SANHQ archive and vm-support will allow for a complete ecosystem analysis.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

canada-it
2 Iron

Re: Achieving Best Performance Possible

yes 3750g's (48 ports) and yes these are dedicated strictly to iscsi. 2 x san connectivity, 2 x for iscsi/vmfs host connectivity, stacked. 2 more 3750g's for vm network.

i realize each environment is unique but any general guidelines that would lend itself to whether 4 nic's vs 2 nics with the equallogic mem enabled would offer a noticable improvement? and 6 versus less session connections?

0 Kudos

Re: Achieving Best Performance Possible

One issue with 3750G is that all packets are broadcast to all ports on every member.   Newer models with the faster stackwise+ don't.   Also some IOS firmware versions have issues that can cause excessive retransmits.

MEM is only going to work with the 2x NICs bound to the ESX iSCSI initiator.  for VMFS/RDM traffic.   You would also need to use the HIT/ME for the storage direct volumes to maximize the other two NICs.

MEM optimizes VMFS/RDM traffic automatically,  compared to VMware Round Robin which for every new volume/RDM you have to tweak the settings.  The benefit gets even larger when you have multiple members in a pool.  Each NIC will create a connection to every member having data for that volume.  VMware RR leverages inter member MESH connections which can require an extra copy of data from one member to the member where the iSCSI connection has been established.   Mem 1.1.x will work with EQL FW 6.0.x to automatically trim connections to prevent the pool limit from being exceeded.

Benefits to splitting NICs for VMFS/RDM traffic from Storage Direct traffic is that during higher IO periods there won't be any connection at the NIC level and you are leveraging more ports on the switch for increased buffering.  

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

sketchy00
2 Iron

Re: Achieving Best Performance Possible

+1 on what Don said.  The flexibility of having guest iSCSI in its own vSwitch is the biggest advantage.  (Many would love to have this luxury, but because of form factors, do not).  For instance, if your requirements dictated so, one vSwitch can be set for jumbos while another may be set to standard, etc.  It just depends.

From a performance standpoint, best thing I can suggest is to absolutely use the MEM.  You'll see some pretty signifigant performance improvment just by running this on all of your vSphere hosts   Then of course, for the VMs that use guest attached volumes, just make sure you've used the HIT/ME or the HIT/LE to acheive that.  It will take care of the grunt work for you.

The other thing that you might consider doing is looking at how you have your storage pools arranged.  It might be advantageous for you to have some of these arrays in the same pool so that it can do some intelligent sub volume tiering of data (putting the hot blocks where they need to be).  I believe it is currently up to 3 or 4 per pool right now on the ability for it to do sub volume tiering (correct me if I'm wrong Don).  But if you do choose that route, I'd maybe hold off on using any Storage DRS if are using that feature in vSphere.  Choose one or the other.  Not both.

My post on the MEM:  vmpete.com/.../multipathing-in-vsphere-with-the-dell-equallogic-multipathing-extension-module-mem

0 Kudos

Re: Achieving Best Performance Possible

re: 3 or 4.  That's not quite correct. There's no strict limit like that.   There are different balance routines going at same time.  By default we only use three members to handle any one volume, up to the limit of 12 members in a pool.  If more members were required because of volume size then the volume would spread to other members.  So those members would try to balance the IO load efficiently across that set of members.

In order to see what kind of configuration would be best, you have to gather data on what the IO pattern is now.    Sometimes its better to isolate SAS and SATA members into their own pools.   Mixing a 6500 with 3TB drives and an small SSD hybrid array in same pool might not be advisable since the majority of space for volumes are going to be on the 6500.  So you miss out on the full IO benefit of the SSDs.  Things like that.  

One vSwitch or two it doesn't matter, since you can set up the bindings as needed. E.g. with 4x NICs, you can bind two to the ESX SW iSCSI initiator and the other two to the VM network ports.   On the VM network ports I would suggest that one VMNIC be set as ACTIVE and the other(s) as STANDBY.   This way the MPIO layer inside the VM isn't aware of the path loss.  Many OS, in fact most, pause ALL I/O when a path is lost, then timeout that path and resume on the remaining.  It's typically not a huge wait, but Linux had some bugs where it would wait forever, and in Windows you can set the link wait time to a high value.   Note: This is ONLY for the VM Guest iSCSI VM Network Portgroups.   ESX iSCSI still must be one ACTIVE and all other paths must be UNUSED.  

Other best practices for the ESX initiator is setting Delayed ACK OFF(disabled), and Large Receive Offload (LRO) OFF (disabled).    If not using MEM, then set MPIO to VMware Round Robin and change the default IOs per path from 1000 to 3.  

Hopefully, you have SANHQ installed, as that's a key tool in monitoring the group performance.   The newest version v2.5 with SAN Assist will send data, including SANHQ data,  to Dell about the arrays if you authorize it.  

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

Re: Achieving Best Performance Possible

Beware when mixing guest iscsi and vmware iscsi traffic on the same vSwitch that you MUST! ensure all of them are configured to the same MTU.

All too often I have seem VMs with guest/direct iscsi run SQL servers where the "SQL expert" has demanded that the MTU be set to 1500 for higher SQL performance. While that may be true this will force the VMware iSCSI connections to ALSO drop down to MTU 1500 since the NICs can not mix MTU 1500 and 9000 packets.

If this is the case you will see a clear benefit from creating a seperate vSwitch and use that for guest/direct iSCSI traffic.

The 2nd use case is often if VMware is putting a heavy load on the iSCSI NICs then it will not be able to control this properly if you are mixing in guest/direct iscsi traffic. Then I would recommend you move it to a seperate vSwitch. If the load is low or moderate then it will not make a big difference.

 & 

Re: Achieving Best Performance Possible

Each individual iSCSI session, can be jumbo or standard frames. You have to set Jumbo on the vSwitch, a VMkernel port MTU size will NOT affect an iSCSI connection from the guest.  The Guest VM NIC settings have to be set to 9000.  You CAN mix jumbo and standard frames all you want.  In fact if you don't set all your VMkernel ports to jumbo you will have a mix of standard and jumbo frame sessions.   You lose efficiency, not functionality.   I see that when ISL trunks aren't configured correctly to support jumbo frames. Connections that don't cross the ISL, are jumbo and those that do are standard.   The array too, has no issue handling the mix.  When an iSCSI session is created that array sends out a SYN packet, using jumbo frames.  If that is ack'd, then that iSCSI session is in jumbos.   If not, it times out and another SYN packet in standard frames is sent.  When that's ack'd you get a standard frame connection.  

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

Highlighted

Re: Achieving Best Performance Possible

Hi Don,

Try setting this up and see what happens for yourself.

vSwitch MTU 9000

VMkernel iSCSI0 MTU 9000

VMkernel iSCSI1 MTU 1500

Connect iSCSI and then check EQL logs. You will see both connections running standard-sized frames. Not one session jumbo and one session standard.

I agree the EQL is able to handle mixed connections, but VMware is not within same vSwitch.

-Rasmus

 & 

Re: Achieving Best Performance Possible

That is different than what I was referring to.  iSCSI1 is likely the default route for that subnet, so in ESX v4->5.0 that is the port that responds to the SYN packets used to negotiate Jumbo vs. standard.   But why would you set one iSCSI VMKernel port to standard and the other to jumbo anyways?  

What I was referring to was using BOTH VMkernel ports for iSCSI controlled by the ESX kernel, and VM Network portgroups for GUEST iSCSI traffic on the same vSwitch.   Those can be a mixture of standard and jumbo frames.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos