This post is more than 5 years old
10 Posts
0
7737
September 10th, 2010 13:00
Understanding LUN Performance
Could someone help me understand how to interpret LUN performance in Navi Analyzer? I have been doing some reading on past discussions but I'm looking for a clearer answer if there is one. My situation begins with a SQL 2005 DB cluster. There have been about three instances over the last few months where the cluster has failed from the active node to the passive node. Each time the storage performance comes into question sighting disk response time per the software used to monitor the cluster (Idera). According to the Analyzer data I don't see any major problems. The last time the failover occurred I did notice that the LUN containing the DB data (RAID 1/0, 4+4, dedicated) appeared to experience a high response time (64 ms) around the time of the failover. The LUN utilization and queue length (25% and 4.7 respectively) are rather low, at least as far as my knowledge is concerned. Forced flushes for that LUN are practically nothing during that same time period. The SPs' performance during that time period are as follows: Utilization 14 and 28%; Queue length 2.5; Resp time 1.75 ms; Dirty pages SPA 71%, SPB 89% (watermarks 80/60).
Now that I've said all that, I've read that LUN response time can or is likely to be falsely reported during times when the utilization and total throughput for that LUN are low and that you should only be concerned when all three of those metrics are high at the same time. Is this true? If so, is LUN response time a reliable metric to monitor? Some of these metrics seem to be a challenge to understandm, at least for me.
I apologize for the lengthy explanation but I wanted to be sure that whomever took this one on had the necessary details. Any help will be appreciated.
Thanks,
Tim


driskollt1
131 Posts
0
September 28th, 2010 08:00
LUN Response time is usually my starting point for troubleshooting. It gives a good indicator of what the host is experiencing. Usually if your LUN response time (Response time = queue length * service time)is good, then usually the host isn't seeing issues. High response times don't also mean that the CLARiiON is busy. It can also indicate that you're having issues with your host or Fabric.
LUN Utilization % is not a good metric to use - It doesn't report accurately. It tends to say utilization is higher than what it actually is. It can also be deceptive, as it may say a LUN is 100% utilized when another LUN is causing your problem
64 ms is fairly high, but I don't think a 64ms spike would cause a failover. That might just be an indicator of the host having issues. The host could have been having issues and responding slow before it finally dies and initiates a failover, that would also drive LUN response time up (A write isn't a write until the host acknowledges it). Also LUN response time would probably go up during the failover process.
At least half of your job managing storage is proving that the array/SAN is not the problem. Provide graphs to people who don't believe you. If you've laid everything out properly, then most likely it's a host issue (bad code, OS Problems, HBA Drivers, Bad HBA, etc)
Usually if I see high response times on a LUN, I check these things on the array
Parks2
116 Posts
0
September 13th, 2010 18:00
I do not know the anwser to your question but maybe you talk to your sales person who might have someone to help you .. EMC does offer classes as well ...
kelleg
6 Operator
•
4.5K Posts
1
September 20th, 2010 15:00
Tim,
In the on-line HELP there is a lot of good information about using Analyzer and there is a statement about the accuracy of certain values (response time being the major one) when IOPS and queue length are both low. As IO's increase over about 100 IOPS, response times get more accurate. But IOPS under about 100 IOPS and queue length low (1-3) usually mean that response times will also be low.
The Best Practice Guides in the Documents section go into greater detail about each of these topics.
glen
tthomas1
10 Posts
0
October 5th, 2010 09:00
It was. I just haven’t made it back out there to mark it as answered. I appreciate your answer. That was what I was looking for.
Thanks,
Tim
kelleg
6 Operator
•
4.5K Posts
0
October 5th, 2010 09:00
Was your question answered? If so, please remember to mark the question as answered and to award points to the person with the correct answer.
glen
kelleg
6 Operator
•
4.5K Posts
0
October 5th, 2010 11:00
Tim,
Don't forget to mark it answered
glen