Start a Conversation

Unsolved

This post is more than 5 years old

128046

June 21st, 2012 10:00

Equallogic - W2K8 latency with IO larger than 64KB

Hello,

I was wondering if anybody at all has experienced an issue like this. EQL firmware tested was 5.2.2 and now 5.2.4 I have tried with both a physical and virtual machine with the same results.

Client Windows 7 64bit

Server - W2K* standard R2

EQL PS6100XV

Whenever  copying a large  file such as an DVD iso file I experience around 100ms latency wand the queue jumps to 30-40. IO size is 128kb in SANHQ. Throughput is around 100MB/s

If I set the Max io disk size under ESX to 64kb from the default of 32MB it is basically 0 latency with the same throughput.

I have been searching around where to set this setting with W2K8. Is this to be expected with W2k8 and the latency. I guess SMB2 will send larger IO sizes and is this overwhelming the EQL?

Thank you

7 Technologist

 • 

729 Posts

June 21st, 2012 12:00

As far as I know the Array can handle the SMB2 I/O traffic and keep up.  Typically we see the bottle neck on the Host, NIC/HBA or switch (drivers, cache, settings, etc.)

I don’t know of any way to change the Max IO disk size in Windows either, not sure if that is even an option in the OS or if it would be a NIC feature.

You can check or try the following:

Switches and NIC’s - enable flow control; Switches = receive on, NIC’s Send = on (or transmit enabled)

You can also try:

netsh int tcp show global

(Take a screen shot or copy for reference to reset back if needed)

netsh interface tcp set global autotuning=disabled

To verify that it is dsabled:

netsh interface tcp show global

To set back to the default Windows behavior:

netsh interface tcp set global autotuning=normal

netsh interface tcp set global chimney = disable

To set it back:

netsh int tcp set global chimney=enabled (to set enabled mode)

netsh int tcp set global chimney=automatic to set automatic mode (available only in Windows Server 2008 R2 and Windows 7)

or

netsh int tcp set global chimney=default

Also not sure if you took a look at this link:

www.speedguide.net/.../windows-7-vista-2008-tweaks-2574

Disable Jumbo (at least on your NIC) and then add it back in once you are satisfied the performance is where it should be.

-joe

11 Posts

June 21st, 2012 13:00

Hi have tried those suggestions. It is going throough a pair of stacked Force10 S25n´s from a DL2200 with 24 gb or ran and 8 cores. Single gigabit iscsi connection. I had to set the diskmaxiosize in ESX to 64kb or our latencies would  go through the roof on large se quential writes. The equallogic seems to struggle a bit with anything over 64kb io. 2003 and linux are fine. Is the latency that we are seeing to be expected?

7 Technologist

 • 

729 Posts

June 21st, 2012 14:00

For the Force10 S25N:

Minimum firmware is 7.7.2.0. Version  8.2.1.0  had (or has DHCP issues, so for now use 7.8.1.0. or higher than 8.2.1.0).

Flow control should be tweaked as follows: "flowcontrol rx on tx on threshold 2047 2013 2013" for all ports that connect to the array(s).

What NIC/HBA are you using?

-joe

11 Posts

June 22nd, 2012 07:00

Thanks Joe. The nic is a Broadcom BCM5716c with the latest driver. The S25N switches are at firmware 8.3.2.0. The switch ports are configured like this. T ried it again and still getting the latency.

interface GigabitEthernet 0/1

no ip address

mtu 9252

switchport

flowcontrol rx on tx on threshold 2047 2013 2013

spanning-tree rstp edge-port

no shutdown

7 Technologist

 • 

729 Posts

June 22nd, 2012 09:00

Take a look at the attached file.

-joe

1 Attachment

7 Technologist

 • 

729 Posts

June 22nd, 2012 09:00

Here is aditional information I have on the BC NIC, you can try these setting as well.

 

-joe

1 Attachment

11 Posts

June 25th, 2012 07:00

Thanks Joe. I will check out those documents. My situation must be unique, I assume that nobody else is getting these type of latencies with a W2K8 file transfer.

7 Technologist

 • 

729 Posts

June 26th, 2012 12:00

If you are still stuck on this, please open a support case, and ask to be esclated to the EqualLogic Performance experts.

-joe

11 Posts

June 26th, 2012 13:00

Thanks Joe. I have a case open and I´m waiting to hear back.

5 Practitioner

 • 

274.2K Posts

July 2nd, 2012 09:00

One thing to remember, when you greatly increase the blocksize, latency WILL go up.  It has too, you are sending more data before an ACK is received.  However, you get more MB/sec that way.   That's just a general rule, not specific to EQL.  

It also seems like you have a single GbE iSCSI connection?  Is that correct?   ESX should have at least two GbE connections and either VMware Round Robin (with IOPs set to 3) or preferably MEM 1.1.0 installed.

Regards,

11 Posts

July 3rd, 2012 08:00

Thanks Don. There are 3 iscsi connection fromthe ESX server with mem 1.10 working. Ive just been looking everywhere to find some comparable stats ont he  file copy between a windows 7 and 2008 r2 machine and the associated latencies to compare it against. Now generally we do have many large sequential writes so it would probably not be an issue. Esx is throwing off some latency warding for those data stores when the transfer is happening.

5 Practitioner

 • 

274.2K Posts

July 3rd, 2012 10:00

How many physical NICs are you using to connect to the SAN network?   MEM can run multiple sessions over a single physical link.  

Here are some common causes for those alerts.  (just explained to someone else a few mins ago)

Over 95% of the time the solution is following our best practices with ESX.

1.)  Delayed ACK DISABLED

2.)  Large Receive Offload DISABLED

3.)  Make sure they are using either VMware Round Robin (with IOs per path changed to 3), or preferably MEM 1.1.0.

4.)  If they have multiple VMDKs (or RDMs) in their VM, each of them (up to 4) needs its own Virtual SCSI adapter.

5.)  Update to latest build of ESX, to get latest NIC drivers.  

6.)  Too few volumes for the number of VMs they have.

 Of course it is possible to over run an EQL group or something amiss in the network, causing delays/retransmits, etc..  

Here's some source material for the Delayed ACK

kb.vmware.com/.../search.do

virtualgeek.typepad.com/.../a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html

How to disable LRO

Solution Title HOWTO: Disable Large Receive Offload (LRO) in ESX v4/v5

Solution Details Within VMware, the following command will query the current LRO value.

# esxcfg-advcfg -g /Net/TcpipDefLROEnabled

To set the LRO value to zero (disabled):

# esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled

NOTE: a server reboot is required.

Info on changing LRO in the Guest network.

docwiki.cisco.com/.../Disable_LRO

How to add virtual SCSI adapter:

 kb.vmware.com/.../search.do

11 Posts

July 3rd, 2012 13:00

I am on ESX 5.0.0 721882. I have tried those recommended settings and still have the latency.

5 Practitioner

 • 

274.2K Posts

July 3rd, 2012 13:00

Thanks.  Definitely insure that delayed ACK and LRO are disabled then.  I'm starting to find that you need to remove the discovery address and rescan.  Make sure the static discovery table is empty.   Then disable Delayed ACK, LRO, the login timeout (see below) and reboot.   Then add the discovery address back in and rescan.   I'm seeing cases where existing volumes don't get the proper settings otherwise.  The DB ESX uses to storage the info isn't updated.

What's the Dell case #:?

Solution Title HOWTO: Change Login timeout value ESXi v5.0 // Requires ESX patch ESXI500-201112001

Solution Details Setting iSCSI Login Timeout Values

When using VMware vSphere 5 software iSCSI initiator to attach to a Dell EqualLogic PS Series group, the iSCSI option: Login Timeout value defaults to 5 seconds and cannot be modified.

Dell EqualLogic has observed that in large configurations and under certain workloads this 5 second timeout is not long enough to process the large number of iSCSI login requests that occur after a network failure, switch reboot, or controller failover.

VMware has released a patch that enables GUI/CLI editing of the iSCSI option: Login Timeout value on the ESXi 5.0 host. Dell recommends that customers apply this patch and increase this value to 60 seconds. Doing so ensures that there is sufficient time for all iSCSI logins to occur, regardless of the size of the environment.

You can set this value in two ways:

• The first way to set it is using the Advanced Properties of the iSCSI Software Adapter in the vSphere Client GUI.

1. Click on the ESX server through the Vsphere Client and choose the configuration tab

2. In the configuration tab find the iSCSI Software Adapter in the top right-hand window. Right click on that vmhba for the ISCSI Software Adapter and choose Properties.

3. On the General Tab of the properties click on the Advanced button near the bottom of the tab.

4. If you scroll down through the Advanced Settings you will find the setting for LoginTimeout. If the patch has not been installed it will be set to 5 and grayed out. You must install the ESX patch listed below before you can change this setting to 60.

The second way to change it is by opening an SSH connection to the ESX server and change it using an ESX CLI command:

• Entering the following vSphere CLI command: esxcli iscsi adapter param set -A -k LoginTimeout -v 60

***REMINDER: Requires ESXi v5 patch: ESXi500-201112001 in order to allow the modifcation of this field. Otherwise it's greyed out in the ESX GUI and the CLI option will not work without this patch ***

The minimum build number that includes this patch is: 515841

Checking the vmware patches, I see a new Build for ESXi 5 see below:

ESXi500-201112001

Download here: hostupdate.vmware.com/.../ESXi500-201112001.zip

Product: : ESXi (Embedded and Installable) 5.0.0

md5sum:107ec1cf6ee1d5d5cb8ea5c05b05cc10

sha1sum:aff63c8a170508c8c0f21a60d1ea75ef1922096d

Download Size: 297.7 MB

Build Number: 515841

See VMware KB 2007680 kb.vmware.com/.../2007680

Release Date: : 12/15/2011

System Impact : VM Shutdown & Host Reboot

11 Posts

July 3rd, 2012 13:00

No problem :)  

vmkiscsid --dump-db | grep Delayed | more

iSCSI MASTER Database opened. (0xffd71008)

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

  `node.conn[0].iscsi.DelayedAck`='0'

~ #

vmkiscsid --dump-db | grep login_timeout  | more

iSCSI MASTER Database opened. (0xffd71008)

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `discovery.sendtargets.timeo.login_timeout`='5'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

  `node.conn[0].timeo.login_timeout`='60'

ive also opened a ticket with Force10 so they can review the logs but will have to wait to hear ba ck.

No Events found!

Top