Performance issues with MS SQL cluster

Question

A customer of ours is having performance issues with his SQL cluster, connected to a DMX3.
First of all he did something terribly wrong and that is place the redo logs as well as the database on the same drive letter. This drive letter is however made up out of 51 luns, held together with Veritas VM.
We tried averything, except separating the logs and database, to get better respons times. We upgraded the HBA BIOS, upgraded Powerpath from 431 to 501, even installed STORport drivers (9.1.4.15) instead of SCSIport drivers, but so far nothing helped at all. I remembered from a Symmetrix Performance Class I once took that there's something called queue depth or - to speak in Qlogic terms - execution throttle. The formula I got from the lady teaching this class was as follows: queue depth (qd) = (total number of devices, including the ones in metas, times 4) devided by number of HBAs

So our 51 luns, made up out of 31 devices and 20 metadevices (with 4 meta members) put into the equasion: qd = ((31 + 20x4) x4) / 2 = 222. We set the execution throttle to 256, but this didn't help either.

Guesses anyone ?

PS I noticed that a registry setting named on "http://download.qlogic.com/drivers/42545/Readme_FC_StorPort_9-1-2-16.htm#61" does not reflect the value of 256 we entered in each HBA by use of SANsurfer. Does anyone know which value takes the lead: the registry value or the value in the HBA itself ?

GlenH · Answer

RRR,Changing the queue depth will not improve your response times - the queue depth is the the number of SCSI requests that the host can queue to the storage, and if the underlying storage is not keeping up, then the host will queue more requests and actually increase your response times (remember response time = queue time + service time)Your best bet is to look a liittle closer at the read and write response times in WLA to see if it's just one or both of these that is high. Look at the sampled average read and write times for the devices and let us know how they look.You might also look at the read and write iops on the devices to see if there are any hot-spots or busy hypers that may be causing the issue.Do you have SRDF/S on these devices ?Glen.

RRR · Answer

No SRDF here. About the response times, I already noticed those are a bit high, but I haven't had time yet to dig any deeper yet.
I will have a look at the read and write iops and try to find hot spots.
I've already contacted our account manager why we don't have symoptimizer in place....

dynamox · Answer

Hi RRR,i am curious if you have made any progress on this, performance issues are always interesting yet painful.

RRR · Answer

I've even started a new thread in the Clariion section (which came down to almost the same discussion) and for this particular case the next thing we're going to propose to the customer (the user of the machine we're hosting for them) is to get the logs and the data separated and measure again when we accomplished that.

Measurement of performance is most of all matter of touch and feel... it's a human thing. for 1 customer 20ms response times might be ok, while for another 5ms isn't even enough.

dynamox · Answer

i know ..i cringe when i hear customer complain about 'slow disk'

Connectrix

Performance issues with MS SQL cluster

Was this post helpful?