Start a Conversation

Unsolved

This post is more than 5 years old

2023

February 11th, 2015 09:00

VMware ESX latency

Hi, I notice that latency on an ESXi 5.1 host shows latency of 30ms. This is showing for the DAVG. The KAVG does not show anything. However when I check the response time on the Unisphere for VMAX the response time does not go over 5ms.

Does anyone know what might be causing this discrepancy between both values?

226 Posts

February 11th, 2015 09:00

Hi,

This is probably because of the reporting intervals being used by the two tools. I'm guessing you're using esxtop to view the DAVG from the ESX hosts, which defaults to 5-second intervals. Assuming you're using the defaults in Unisphere for VMAX, the diagnostic interval is 5 minutes -- 60 times longer. Longer sampling intervals mean that the averaging effect will hide peak values.

It is possible in Unisphere to get data for shorter intervals (Real Time Traces), but these traces do not include LUN / Storage Group level metrics, as there are just too many objects in a VMAX to reasonably track this data at that granularity. An ESX host has much fewer objects to manage, which makes it possible to provide more granular metrics at shorter intervals.

The values you see from the host also include the time IOs spend traversing the SAN, whereas the values in Unisphere for VMAX show the time that IOs spend inside the VMAX (exclusive of any time spent within the host/kernel/HBA/SAN).

Thanks,

- Sean

465 Posts

February 11th, 2015 14:00

ESXand Unisphere are taking measurements at two different points along the IO path. It is possible for both to be correct. A fully subscribed ISL or a slow drain device in the SAN for example can produce the differences you are seeing.

Perhaps you can:

-     Health check the SAN

-     Take a third set of performance numbers at the guest layer (perfmon for windows or iostat for unix) to verify that the numbers at the ESX level are seen in reality by the guest(s).

33 Posts

February 12th, 2015 05:00

@JasonC

How can I healthcheck the SAN? Using Brocade SAN Health? Would this really show up an latency?

33 Posts

February 12th, 2015 05:00

Okay I will need to find out then why there is such a big difference between the VMAX 'response time' and the ESX host 'DAVG'. I have no idea how to figure out where this latency is being introduced? Basically it could be the SAN or it could be the array because of the way the diagnostic interval is set not seeing the spike?

@ Sean Cummins

Yes i am using esxtop.

But do you think that the VMAX is missing these spikes because of the diagnostic interval? So the array could still be the cause of the host latency?

I have CMCNE here. Can that be used to check for SAN latency? I have Fabric Watch installed and none of the ports report high utilisation.

226 Posts

February 12th, 2015 07:00

Yes, I think it's very likely you're not seeing these spikes in Unisphere because of the huge difference in time intervals. Any time you compare metrics using dramatically different intervals like that, you're going to see differences. And the differences are going to be compounded in this case because you're viewing these metrics at two different layers in the stack.

If the 5-second ESX 30ms DAVG you're seeing is periodic rather than sustained, that isn't really unusual -- I wouldn't be terribly concerned about it, unless you have app owners / users complaining.

33 Posts

February 13th, 2015 02:00

Latency_VMAX_ESX_host.PNG.png
I just have to try to explain it I guess. Explain the cause of the spike. The spike is seen from an alert generated by ESX host saying latency has jumped from 1ms to 30ms.

No Events found!

Top