Queued I/O's

Question

Ok, so an AIX 5.3 using PowerPath 5.1 has dual path to a MetaLUN on a Cx700. the MetaLUN is a two component 110GB distributed across three RAID 1/0 groups with four spindles(73GB) apiece. Navisphere Analyzer shows very low utilization on the actual MetaLUN, but the PowerPath utility shows Queued I/O's (from 4-20 in a powermt watch)down both (IBM)HBA's even though the Cisco MDS 2GB/sec directors show very low utilization on their associated ports.

If the disk is bored and the fabrics are yawning for these connections, why is there queued I/O's happening at the HBA's? Anybody know what is a high level(warning threshold) of Q-I/O's at the HBA?

thanks!

msumner1 · Answer

negative, no ISL's are used, these are all local resources on the same switch

SKT2 · Answer

does this goes through any ISL connections? Check the utilization of those who invloved to the hosts..

bodnarg · Answer

You looked at the MetaLUN - but did you look at the actual utilization for the entire Raid-Group where the MetaLUN lives? May be another applications sharing the physical disk which is causing you some performance issues. This does seem like a stretch given you are using multiple Raid-groups, but worth checking if you have not done so already.

As for PowerPath can you confirm that the mode is set correctly for Clariion and that you are not trying to use the inactive paths or something odd with a configuration setting. A good way to do this is check to see if you have LUNs doing a lot of trespassing which could show as a "stealth" performance issue that is hard to track at the Clariion level.

Lastly - have you checked your OS level performance statistics? If you are getting good service time on your I/O requests then it is possible the powerpath queue display is just a bug. If you are seeing queuing/poor response at the OS level then you should be more concerned.

Short answer - queueing in PowerPath is definitely not good. You either want to start at the bottom (storage) or at the top (host) and work your way down. Sounds like you don't have one of the "common" reasons for performance problems...

msumner1 · Answer

Thanks for the info, CLAROpt is the PP policy, and no trespassed LUN's on the frame currently. I just pulled navi-analyzer for the RAID groups that make up the MetaLUN, and all other LUN's within these groups are nominal(avg utilization 32%), but the Meta in question is currently being utilized around 96%. I asked the applications group about this, and their utilization from the host perspective is showing 100% with an expected I/O load currently underway. This is an Oracle DB, so I believe this is expected behavior, and may just be outgrowing the current Cx700.

I'll keep an eye on it, but if they are seeing utilization maxed locally with regular service times, then I might expect to see some queued I/O's depending on the rest of the local resources and how they are being utilized.

any other ideas to help track this down from the storage level are definitely appreciated!

msumner1 · Answer

Thanks all, appreciate the input.

JoeO2 · Answer

You stated the following 'MetaLUN on a Cx700. the MetaLUN is a two component 110GB distributed across three RAID 1/0 groups with four spindles(73GB) apiece.'You have to divide the number of queued I/O's by the number of spindles per path (In your case 4x3=12 spindles) assuming the RAID groups use separate spindles. 20/12 < 2 which is less than 2 (threshold for disk bottleneck)

PowerPath

Queued I/O's

Was this post helpful?