SAR Reports in AIX Server Using CLARiiON

Question

I'm collecting SAR data on a number of AIX servers. I can see disk activity for the hdiskpower device and the two related hdisk(s) or paths but am uncertain how to interpet the data.

As an example, hdisk351 and hdisk385 are showing 87% and 92% busy respectively - but the hdiskpower device shows 99% busy.

Also, I'm seeing "avwait" and "avserv" reported for hdisk351 and hdisk385 but not for the hdiskpower device.

Very confusing - any insight on these discrepancies is appreciated.

Thank you.

Parks2 · Answer

We usually look at Navisphere Analyzer data but I did find several documents out on Google that might help you .. Search for 'interpret sar reports aix' I believe this is going to help you understand the SAR Report.

SKT2 · Answer

%busy gives trend of how busy your disks are . But this always does not mean there is peformance issue, rather says there is always at least an IO served by the disk. Sar reports the data about the pvlinks( multiple  paths), not the *power* devices. Also avwait time should be lesser than aserv time  and IMHO avserv below 20 is good clar devices.

ejk67 · Answer

Thanks to everyone for their assistance. I agree that a disk can be very busy but I wouldn't expect 100% busy 100% of the time. I have reached out to IBM and they did not see an issue; IBM focused on respnse times. The SAR report does show statistics for both the power device and the underlying path - I can't believe the number related to the power device is invalid. I have a server that is similar to the one in question though the file system sits on a LUN spread over several disks - never more than 50% busy. The server in question has a LUN created from a single drive and mirrored on the back end.

ejk67 · Answer

I appreciate everyone's response as they were helpful - I don't know there is one answer to this though I'm fairly certain it's tuning on the AIX side. According to Navisphere, the RAID group and associated LUNS are not heavily utilized. That indicates that the OS is not allowing more data to be pushed or providing the OS a larger buffer. I'm continuing to investigate whether that's on the HBA or a 'vmo' setting or the manner in which the file system is set up. In summary, it's not an EMC problem.

SKT2 · Answer

do u see Queued IOs in `powermt watch` during the problem window?

Nollaig1 · Answer

Just to add, if there are queued I/O at any time this data is stored and not cleared until a reboot. So a display dev=all will show any I/O queueud since last reboot. If the number is incrementing, then you have an issue.

Hope this helps,

Nollaig

ejk67 · Answer

Yes, anwhere from 0 to 5. Is that bad?

SKT2 · Answer

Ideally you wont see a high number here i queue IO. But more than 10 is definitly bad

Nollaig1 · Answer

Hello,

It depends on the numbers as well, 0 is fine, no queued I/O and if the path is stable at 5, but not incrementing then that could indicate an old issue, that is not current. Its really if the numbers are continually increasing, there is an issue to be addressed.

You never say if there are errors reported for paths? Are you seeing issues on links/SAN?

The paths with the higher queued I/O, what have they in common, are they through the same HBA/path/zone.. what have they in common, if anything?

Hope this helps,

Nollaig

CLARiiON

SAR Reports in AIX Server Using CLARiiON

Was this post helpful?