Is there any best practice guides out there to explain how to best utilitize and understand the analyzer graphs and metrics?
Looking for what my queue time, response time, etc should be under to determine if there is a bottleneck. Also do these numbers change based on how many disks are in the lun/raid group.
I wrote up some blog posts that may help you.. http://storagesavvy.com/2011/03/30/performance-analysis-for-clariion-and-vnx-part-1/
Also, you should look for the Best Practices for Performance and Availability doc on PowerLink, that has some information about performance of various components as well.
Richard, great blog! I've got a weird scenario that I'm not sure how to understand.
Raid group with 4 - 450GB FC disks in a RAID5 array. There are two LUN's in this RG. One LUN is showing response times between 30 and 80ms (the LUN in question). Each disk in the RG is showing under 16ms. The second LUN's response time is well under 16ms. Looking at the other metrics for the LUN in question show the following
queue - under 1.5
total bandwidth - under 0.5
total throughput - under 32
read size - 16 avg
write size - 13 avg
service time - 9
Why is the response time so high on this LUN? The SP that it's running on looks fine...
Look at two values for the disks - Total IOPS and Total Bandwidth. Are the disks 10K or 15K? Use the following numbers for best practice:
Each 10K FC disk - 120 IOPS and 10MB/s
Each 15K FC disk - 180 IOPS and 12MB/s
Make sure that the disks are not exceeding this.
Also, when IOPS are low (under about 100 IOPS), response times can be artifically high. Zoom in on the part that shows high response times and select both Total IOPS and Response times - see if the point that has high response time is at a low IOPS point.
Glen, thank you for your reply. They are 15K FC disks... When we hear of complaints this is what we see on the SAN
Total IOPS < 38
Total MB < 0.5
Response Time - Between 28ms and 75ms
Why is there a high response time during low IOPS? Can you explain articifically?
This is the way that the algorithum for response time was written - the higher the IOPS, the more accurate response time - with high IOPS you want that to be as accurate as possible. With low IOPS, response time is not normally a problem, so the accurancy is less. This is documented in the On-LIne HELP under the Analyzer section:
Managing Block-based systems / Analyzing Storage-system Performance using Analyzer / Troubleshooting performance metrics / LUN Response Time:
Under a light workload, which is low utilization and low throughput, the LUN response time may be reported as being very high even though the actual response times for requests to this LUN are normal.
Thank you Glen. So, if the SP isn't loaded, and all the metrics are that low then there are no problems with the LUN/disks correct?
If the utilization/iops/bandwidth are that low, is it even possible to increase the speed?
When you look at the LUN using Analyzer and all the metrics (Total IOPS, Queue Length, Average Busy Queue Length, Service Time, and Response Time) are low, then the LUN is not busy.
To increase IO at the LUN, the host needs to send more IO.