ironcheflouie

76 Posts

2073

August 19th, 2008 11:00

Disk IOPS - Navi Analyzer

Hi. I started a Navi Analyzer session an noticed my dedicated a LUN 3+1 R5 300 GB FC 15k LUN was avg over 1000 IOPS. This is on a CX3-80. How is this possible when the max disk IOPS is (150 x 4) = 600 disk IOPS Total?

I quickly looked over the other counters and they looked fine, esp Utilization which was under 40%. I would assume these are tied together.

Thanks.

Responses(12)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

August 22nd, 2008 13:00

Navi Analyzer counts utilization of a resource as busy time / (busy time + idle time). Busy is defined as when one or more requests are active or queued at the resource. So 40% utilization means the LUN has 1 or more active requests 40% of the time. If a burst of requests arrive "all at once", they may be queued briefly at the LUN level, then distributed to the drives that support that LUN, so each drive gets several requests and each drive starts to work on the first one in its queue. So the "LUN" at that moment is very busy working on several requests in parallel, and also has several other requests queued up. Once that burst of requests is completed, the LUN is counted as idle, in your case, 60% of the time. Key point: utilization is NOT showing the percent of maximum performance capability.

As previously mentioned, drive IOPS rules of thumb like "150" are only guidelines, not maximums or limits. The intention is to give you a single number to use for conservative estimates of how many drives are needed for your workload, with preference for best response time. As such, that number must cover all cases. It is generally based on the capability of the drive when presented with a low thread count small block random workload spread across at least 1/2 of the drive GB capacity. This IO profile requires the drive to do significant seeks, and seeks take time.

If your workload is more sequential, or highly localized, there may be little or no seeking. Higher thread counts at the drive allow the drive to reorder the requests to minimize seek distance, which minimizes service time and maximizes throughput. With a pure sequential small block workload, with a few threads so the drive always has the next request in queue as it executes the current request, it can read those small blocks as fast as the platter turns, no missed spins, no seeks. With a workload like that the drive can deliver each piece of data in 1 ms without too much strain, and 1 ms service time means 1000 IOPS.

A pure load like this is rare and this performance should not be expected, but it can occur briefly during the natural bursts of a normal workload. That is why you may see disk IOPS far higher than the rules of thumb, even though utilization is much less than 100%

Allen Ward

2.1K Posts

1

August 19th, 2008 11:00

The numbers your are talking about (150 IOPS) is not a maximum, it is a recommended sustained maximum. This means you should plan to never exceed this number for any extended period of time.

The drives themselves can burst to much higher rates, but it can have a serious impact on your performance if you rely on it.

I've seen 146GB 15k drive sustain 300+ IOPS for minutes at a time, but once we found it we redesigned the disk layout to prevent it happening again.

ironcheflouie

76 Posts

0

August 19th, 2008 11:00

Hi. Thanks for the feedback. How does this correlate to Utilization, which was only 40%.

Thanks.

Allen Ward

2.1K Posts

0

August 19th, 2008 11:00

I'll have to defer to one of the performance experts from EMC on this, but my understanding is that utilization is one of those things that has a complex algorithm calculating it in the background and it cannot be relied on entirely. Usually it is a good indicator, but it isn't an absolute number.

garcd7

26 Posts

0

August 19th, 2008 12:00

You overloaded a LUN(3+1), it does not mean the SP is busy, just the backend of the CLARiiON
Here are some parameters recommended by EMC CLARiiON Best Practices:
FC disks (I/Os)
Threshold value = 160
ATA disks (I/Os)
Threshold value = 70
seeking FC disks
Threshold value = 10% of diskcap. or > 30GB/s
seeking ATA disks
Threshold value = 10% of diskcap. or > 30GB/s
FC disks response time
Threshold value = 15ms (total iops > 20)
ATA disks response time
Threshold value = 15ms (total iops > 20)

ironcheflouie

76 Posts

0

August 19th, 2008 14:00

The utilization graph was for the LUN, not the SP. With that said, what does a med-low utilization but HIGH IOPS for a LUN tell me?

Thanks.

kelleg

4.5K Posts

1

August 20th, 2008 15:00

Please see the documents available at PowerLink:

EMC CLARiiON Best Practices for Fibre Channel Storage: FLARE Release 26 Firmware Update - Best Practices Planning

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/H2358_clariion_best_prac_fibre_chnl_wp_ldv.pdf

EMC CLARiiON Fibre Channel Storage Fundamentals - Technology Concepts and Business Considerations

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/H1049_emc_clariion_fibre_channel_storage_fundamentals_ldv.pdf

In additional, if you click on HELP in Navisphere and look for the Analyzer section, most of your questions are covered.

glen

Message was edited by:
kelleg

ironcheflouie

76 Posts

0

August 22nd, 2008 09:00

I'm working with a Support guy to verify that Analyzer is installed correctly. The stats seem wrong for some reason. We'll see.

ironcheflouie

76 Posts

0

August 22nd, 2008 14:00

Thanks for the nice feedback.

RRR

2 Intern

•

5.7K Posts

0

August 31st, 2008 12:00

Smaller I/O's can be services more per second than very large ones, so you might see 1000 iops with I/O's of 1kB and 30 iops with I/O of 1Mb. Bandwith and throughput are ... how do you you say this in english ? .... each other's opposites.... If one is high, the other is low.

Hope it makes some sence

SKT2

2 Intern

•

1.3K Posts

0

August 31st, 2008 13:00

that is called Inversely proportional to each other

RRR

2 Intern

•

5.7K Posts

0

September 10th, 2008 13:00

Sounds ok to me

View All

No Events found!