111 Posts

January 20th, 2008 17:00

I am curious about that myself...

This is how I look at it but I am not sure if I am correct.

Like all metrics the system is sampling.
In queue length, if I sample 5 times and this is the sample
1: 3 I/O
2: 0 I/O
3: 2 I/O
4: 5 I/O
5: 0 I/O

The average queue length is 10/5 = Queue Length 2

Same metrics but doing Busy Queue Length, you throw out samples 2 and 5

The busy queue length is 10/3 = Busy Queue Length 3.33

Average Busy Queue Length
Average number of requests waiting at a busy system component to be serviced, including the request that is currently in service. Since the queue length is counted only when the component is busy, the value indicates the frequency variation (burst frequency) of incoming requests. The higher the value, the bigger the burst, and the longer the average response time at this component

10 Posts

January 20th, 2008 21:00

Sounds good to me. You couldn't have explained it any better.

38 Posts

January 22nd, 2008 01:00

There is a couple of things to consider when looking at the queue reported within Navisphere Analyzer. The general idea is that if the average busy queue(AvByQ) is greater than the average Queue, then this means we have burstly activity; well yes and no.

Let me explain;
Typically with steady read and write activity to a LUN, the values of Queue and AvByQ will be relatively close, and is a good indication we have steady activity. Where that philosophy shifts somewhat is when you have a higher level of write activity and the array is under a reasonable load, a load where high watermark flushing is being used.
Writes, will typically go into cache, and with optimization it wants to keep as much data in write cache to then enable it to coalesce into larger IO or even full stripe writes to reduce backend activity at the disks. In doing this it will periodically write data out based on flushing algorithms. Where this gets busy is in high watermark flushing, that point at which higher priority processes are started to flush dirty pages from cache more aggresively to disk to reduce the current dirty page status of cache down to the low watermark setting.

During flushing there maybe in excess of 10 IO's at the disk, but will be sent in bursts from cache thus the actual disk activity will be seen as bursty whereas the higher level LUN may not have bursty activity, but relatively steady.

If the LUN AvByQ is much higher than the average queue, this is a normally an indication the host is bursty, and could be limiting your IOPs capability but will certainly be impacting IO response time. As the average response time reported by Analyzer is for the sample period, typically 2 minutes, you should check host reporting mechanisms to determine what the response time is (Perfmon, iostat, sar).

The desired values for queue and AvByQ are relative i.e. if I have a queue of 4 on a component and that component service time is 8ms, my IO will take 4*8=32ms. If I am happy with that, fine. If I cannot accept 32ms response time I need to keep the queue down i.e. less busy, less IOPs.

If the queue at the disk is driving the host IO response time up consistently, maybe more disks in that LUN will help distribute the load and lower the queue and therefore improve response time. If we are consistently bursty with higher AvByQ at the LUN level, you may need to look at the host configuration to determine why it's exhibiting burstly behaviour (host buffers maybe).

Thus, interpretation of Queue and AvByQ will depend on what component you are looking at, the IO distribution, cache utilization and correlation to host reported metrics as necessary.
No Events found!

Top