Start a Conversation

Unsolved

This post is more than 5 years old

A

2936

September 21st, 2017 07:00

Iscsi spikes normal?

I'm noticing a lot of spikes on our Equallogic iscsi storage, when I go to the advanced>disk/Virtual disk monitor, we seem to be getting a lot of spikes at ranging from 22-160 ms.  The average numbers look good, 2ms and such.  But in the course of a few minutes we'll have a number of these (usually) read latencies on a number of vm's on different hosts. I fired up esxtop and looking at disk view under the DAVG i'm seeing a few of my luns hitting again anywhere from 14 all the way up to 166.  It's not constant but I can usually find a few with in a few minute period.  This seems to be on a few different SAN's.  But before i go opening support cases I just wanted to see if this is normal behavior and I'm making a mountain out of a mole hill or if this warrants further investigation.  

* Our sans are all a few firmware version behind except one.  And it's having similar issues, though not as extreme.  We'll be updating the controllers over the weekend.

* We are using the EQL MEM, SAN HQ is showing that everything is fine and not generating any alarms and our IOP counts are low/medium. 

* I've been over the EQL best practices guide a few times and verified that we do have everything configured correctly for that document.  

* our NIC fimrware and drivers all appear to be up to date on the hosts

* we do notice a performance problem on the servers a lot of times, they can be very sluggish when you are RDP'd into them. 

* I'm having our network guy check the switches this traffic passes through, but they are saying all their metrics look good.  Our entire network is 10 gig with the iscsi on it's own dedicated NIC's/Switches

*  Weirdly though, the average rate for the iscsi data is only 10-15 mb's.  At least what ops manager is reporting on the dvs.  

5 Practitioner

 • 

274.2K Posts

September 21st, 2017 07:00

Hello, 

 Performance cases require a great deal of information to be gathered to resolve them.  In SANHQ when the latency is high, whats the IOPs level? 

 Regarding best practices, if you had to change anything, for example Delayed ACK, did you remove all the targers and discovery addresses and put them back with proper settings or just chance the settings after?  If the later, then only newly discovered volumes will be running with the proper settings.  The existing ones have the incorrect parameters stored in an iSCSI DB.   So tearing that all down, rebooting then re-adding discovery IPs back in and a rescan is the only way to be sure all entries are correct. 

 Here's is the latest version of the best practices guide.  just released the other day.  No major changes, updated for recent versions of ESXi and some clarification of text. 

When you say 'all' the best practices, does that include not sharing multiple VMDKs/RDMs on a single Virtualized SCSI adapter in all your VMs?   This is a VERY common cause of VM latency. 

http://en.community.dell.com/techcenter/extras/m/white_papers/20434601

To answer your question, typically yes, iSCSI traffic is rarely a sustained average.  Especially in virtualized environments.  

 If you want to be sure, my best and strongest suggestion is open a support case. They will need diags from all EQL array members, SANHQ archive,  VM supports from nodes showing the problems and switch configuration information.  I.e.  a "show tech" or equivalent. 

 So if you gathered that ahead of time that will help speed up the process of triage. 

 Regards, 

Don 

57 Posts

September 21st, 2017 08:00

Thanks, I will probably do so when we get the firmware updated as that is tyicpally supports goto answer for everything, even if the notes don't mention the specific issue we are having at the time.

Yeah, thats the guide I used to check our settings.  We are seeing this on the newer and older volumes, and after I last updated the MEM I went back and verified that all the older datastores had the proper configuration.  However I will try removing the targets again on a few hosts and see if we notice any difference after adding it back in.  

Most of our servers have the C:\ on one scsi controller and other disks on a different one or two depending on the server.  Thank you for your reply.  

5 Practitioner

 • 

274.2K Posts

September 21st, 2017 09:00

Hello, 

 

 RE: Firmware.  I know at times it seems so.  Many times this comes from the knowledge that newer versions have fixes that are important in general.  Gather at least diags before you upgrade firmware. On restarts / failovers a lot of data will be lost on how the arrays have been performing.  The cache data for example will only show since last reboot/failover.  Gathering the diagnostic files also clears these counters as well.   If you can't wait, do the upgrade then wait a week or so to open service request.  That will give them some data to look at in the array diags. 

 Did you notice when the latency is high the IOs are low, as seen in SAHNQ? 

Regards,

Don 

5 Practitioner

 • 

274.2K Posts

September 21st, 2017 09:00

Hello, 

 Re: Spike. What I'm looking for is when you see latency spike at the same time are the report IOs for that volume very low

 Re: DB. Absolutely!  If you make a change in the GUI that only impacts volumes to later create. Not any existing volumes.  When you reboot those connections are restored with the settings at the time it was first connected.  The change in the GUI does not go out an modify existing entries in the iSCSI DB. 

 The only resolution is to put node in maintenance mode, remove all the Static mappings entries, and Discovery addresses.  Then reboot that node.  Add the discovery address back in but do NOT rescan. 

Then you can set  DelayedACK off,  LoginTimeOut to 60.  iSCSI NOOP to 30.  Then rescan. It will repopulate the database correctly.  . 

 Don 

57 Posts

September 21st, 2017 09:00

The IO's seem to be fairly consistent.  Even when I see an IO spike I don't always see a latency increase, or at least not much of one in either vmware or SANHQ.

So for the old iscsi settings in the DB, even if I changed those in the vsphere client are you saying they still might persist on the hosts?

57 Posts

September 21st, 2017 11:00

Like I said, they are about normal.  The IOP's on all our sans except one are "low" by SANHQ's measurements, and usually only a fraction of what the estimated overall total IOP count can max out at.  The other ones has only gotten to about 50-60% of it's IOP max.

I guess I don't understand why it would still say those settings are configured properly after a reboot if it was populating that data from a database that had the old settings in it.  I'm 99% sure we used a script after the last MEM upgrade to make sure all our older Datastores were given the new settings.  But it won't cost us much to try it so I'll give it a go.  And I believe that the MEM auto configures those settings for any equallogics no?

5 Practitioner

 • 

274.2K Posts

September 21st, 2017 11:00

RE: GUI  It's only showing you what the new default will be.  Since it can have records will different values 

re: MEM.  If you used setup.pl with MEM and included the --bestpractice option it doesn't configure all the settings.  MEM by itself will on the fly set the logn timeout on the fly. 

Regards, 

Don 

 

No Events found!

Top