Start a Conversation

Unsolved

This post is more than 5 years old

1009

January 15th, 2010 14:00

Low BW and throughput...where does the high utilization come from?

In Navisphere Analyzer I have some LUNs that periodically show very high utilization that doesn't correlate to high bandwidth or throughput -- read or write -- at those times. In fact I don't see any other metrics that show corresponding spikes.

Is there any way to figure out what's causing those spikes in utilization? Can activity on other LUNs that are on the same RAID group influence a LUN's Utilization metric? Is it due to activity related to the LUNs being the primary images in MirrorView/A mirrors?

If so, I'm wondering how I can determine that for certain, rather than assuming that it must be MirrorView activity since none of the LUN-specific metrics under Performance Detail for the LUN seem to account for the high utilization.

34 Posts

January 16th, 2010 03:00

Your observation is accurate. LUN utilization is not an indicator of how hard the raid group or LUN is pushed in terms of throughput or bandwidth. A LUN is said to utilized when, at the sampling instant, the LUN is servicing IO, be it one IO or a 100, the result is the same -- the LUN is 100% utilized. Basically at the sampling instant, it is a binary function -- 100% is LUN is servicing IO or 0% if not.

Analyzer depicts a data point which is an average of many such samples. For example if Analyzer averages over 10 sampling instants and amoung them, at 5 instants, the LUN is servicing IO the LUN would be 5/10% busy, i.e., 50% busy.

At same time two busy luns may not servcing the same amount of IO, but end up being similarly utilized. Use the utilization metric for LUNs to check which LUNs are active for a particular time period. Then once you have zeroed in on the LUNs, then check specifc characteristics such as BW, Throughput, resposne times etc.

The amount thoughput the LUN can absorb while being busy or the most sustained bandwidth at which the LUN may service is largely a function of the state and number of disks in the underlying raid group (execuse thin LUNs from this discussion -- they are a little more complicated). So, if the disks are capable of being pushed without affecting response time, the LUN can do more, while being busy.

Analyzer help has very good documentation about the metrics.

From Analyzer Help:

Utilization

Describes the fraction of a certain observation period that the system component is busy serving incoming requests. An SP or disk that shows 100% (or close to 100%) utilization is a system bottleneck since an increase in the overall workload will not affect the component throughput; the component has reached its saturation point. Since a LUN is considered busy if any of its disks is busy, LUN utilization usually presents a pessimistic view. That is, a high LUN utilization value does not necessarily indicate that the LUN is approaching its maximum capacity.

When the LUN becomes the bottleneck, the utilization will be at or close to 100%. However, since I/Os can get serviced by multiple disks an increase in workload might still result in a higher throughput.

About utilization of one LUN being influence by another, I believe that Analyzer shows a LUN to be utilized only when IO is being serviced by that LUN and no other LUN.

To be sure if MV activity is causing issues, look at disk statitics.

Cheers,

-joji

Message was edited by: joej

4.5K Posts

January 18th, 2010 12:00

layered application IO ( is not reflected at the LUN level, only on the disks. If you have a LUN that is the mirror for a source LUN, you will see the IO on the disks, but not on the LUN. Of course at the disk level, the IO is all the LUNs so it may be harder to see a particular LUN's IO is there are more than one LUN in a raid group.

glen

No Events found!

Top