Confused on IOPS count in Analyzer

Question

Hello All,

I'm a bit confused on IOPS count in my analyzer readings. I'm on a CX-500 (2.19 flare). Upon opening some analyzer archive files, I am seeing numbers like 7 and 14 million I/O/sec as average sample points, whereas I believe the CX-500 is rated at 120,000 IOPS total.

Is there a difference between I/O/sec and IOPS? Or do I have to divide by my sample interval or some other number? My interval is 600, so this would make sense if I were to divide by 600, but since the sample points seem to be indicated in I/O per sec, I took this to mean that was the number at that very second.

Thanks in advance and sorry if a dumb question.

RyanP2 · Accepted Answer

If you review the SP Logs and find the trespassing every few seconds, then that is probably part of the problem and maybe more.

I know that with PVLinks that every setting needs to be correct to fully stop trespassing. If you can pull up Primus, here are a few to take a look at:

emc21180, emc143405, emc3038, emc44736, and emc99467.

-Ryan

RyanP2 · Answer

Not a dumb question at all.First off, if you go into help in Navisphere there is a section called 'analyzing storage system performance using analyzer' (or something along those lines). In there is a basic information section that has background info on the counters.As for the I/O/s and IOPs, they are the same (both are IO per second).The archive interval of 600 will give a pretty good broad look at the performance and the points in there do represent IOPs, so no dividing of anything is needed. One thing to keep in mind when reviewing these is that each point does represent 10 minutes of averaging, so they may not be as good as picture as you may want.Now, as far as the 7 million IOPs goes, this is a 'feature' of analzyer . Some things that can cause these huge numbers are a trespass during that 10 minute interval, or just a missed poll due to the busyness of the array (IO takes priority over analyzer). Can't speak for any known issues in analzyer as I don't know your patch (if you pull up the analzyer release ntoes you can check into those).I would disregard the points that just don't seem possible.-Ryan

JubJub1 · Answer

Hmmm....There are just too many points to ignore. I am running 2 HP-UX hosts with PVLinks as opposed to Powerpath, and I do get many trespasses even though I have set PV timeout to 180.I wonder if that is screwing with the IOPS count?

JubJub1 · Answer

Thanks much Randy.I have seen these before tech notes before, but will look into changing other values beyond pvlink timeouts.

JubJub1 · Answer

Hi Everyone. I just wanted to mention that I solved the issue. I figured that something had to be polling the paths every so often for them to be tresspassing so much. I then noticed that 'proactive polling' was turned on (as it is by default) I turned it off by doing (in HP-UX 11.11) a pvchange -p n /dev/dsk/cXtXdX and voila! No more tresspasses--none!

I waited a few days and then checked my archive and was happy to find that I was only peaking out at 1536 i/o/sec

My users have also noticed an improvement in performance, which is also a plus.

After searching again through powerlink, I noticed that this issue was documented in emc143405. Interestingly, I did not have the -p option on my dev/test box until I happened to upgrade to the patches mentioned in that powerlink article.

Hopefully this might help someone.

Take care and thanks for all the help!

SKT2 · Answer

Proactive polling was added in an LVM patch several months ago. Unfortunately, it was turned on automatically for every disk in the system. For internal disks, it's fine as the LVM code periodically polls all the disks to see if they are still alive. The idea is to possibly catch an error condition before real data is affected. So for JBOD (non-array disks like internal) it is fine.

However, for smart disk arrays such as EMC, Hitachi, IBM, Axiom, HP, etc, for path managers such as PowerPath, DynaPath, etc and for specialized SAN appliances such as products from FalconStor, this proactive polling interferes with performance and is a waste of CPU and I/O cycles. In a smart array, all management of disk failures is completely hidden from the host (as it should be).

You can remove proactive polling with pvchange -p n . The pvchange command will not have the -p option if the LVM patch level is quite old. You can do it during production but for volumes with lots of alternate paths, it can take 3-6 seconds per lvol. This is a one time change. Unfortunately, there is no way to change the default setting so adding any new disks (LUNs) will require turning the polling off.

CLARiiON

Confused on IOPS count in Analyzer

Was this post helpful?