PS6100XS - SQL Server I/O Slow

Question

Hi,

I am wondering if anyone has SQL Server running on a PS6100XS. For some reason, I am getting a lot of messages pertaining to I/O requests taking longer than 15 seconds when I am running statistics updates on large tables.

We had SQL Server running on a PS6000XV without any I/O messages so it's odd to me that I am getting those messages when the PS6100XS is supposed to be faster than the PS6000XV.

Could anyone help me with this issue?

Thank you.

dajonx · Answer

I have also noticed that when I am executing a full backup on a large database, it takes six hours to complete whereas it used to take only two hours on the PS6000XV.

dajonx · Answer

No, different environment.  We are using 6048s and is dedicated to iSCSI use.  I have opened a support case and they have looked through the diags, switch output, and SANHQ.  The last thing they told me was that the switch and SAN looked good and that I should disable the delayed ACK and Nagle's algorithm.  After adding those entries to the iSCSI interfaces in the registry (and rebooted), I still see the same I/O messages.  I haven't heard back since...

dajonx · Answer

When looking at Group Manager volumes, I noticed that there are only two connections being utilized instead of four. Is this normal?

(Well, actually there are four since the volumes are part of a cluster, but the server where the volumes resides only uses two connections and the other two connections are part of the other server)

dajonx · Answer

Do you think the volumes are not using MPIO?  There should be four connections from one server instead of two, correct?

dajonx · Answer

Dell Networking N3048 Switch www.dell.com/.../fs I'm sorry...  I added an 'S' by mistake... Yes, I did install HIT/ME on both servers.

dajonx · Answer

Oops, I meant to say two NS3048.

I disabled the delayed ACK/Nagle Algorithm via the registry and rebooted. This is what I added:

To disabled delayed ACK and Nagle’s algorithm, create the following entries for each SAN interface subkey

in the Windows Server 2012 registry:

Subkey location:

HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Services \ Tcpip \ Parameters

\ Interfaces \

Entries:

TcpAckFrequency

TcpNoDelay

Value type:

REG_DWORD, number

Value to disable:

1

We are not using ESXi. This is strictly a SQL Server 2008 R2 database server (in a cluster). The servers are connecting to the SAN via HIT Kit 4.6. Is the information pertaining to ESXi still relevant to a SQL Server database server?

Thank you.

dajonx · Answer

Yes, I believe the firmware is up-to-date as my coworker installed it about two weeks ago.

In iSCSI Initiator, if I don't check MPIO when connecting to the volume, would there only be two connections instead of four? I just find it odd that there are only two connections on most of the volumes.

I've emailed him several times, but have not heard back. He also would not give me his phone number. So I'm pretty much dead in the water... And frustrated.

dajonx · Answer

We use stacking modules.  Is that what you mean?

dajonx · Answer

Thank you. I found out he works from 1:30 - 9:30pm cst whereas I'm here from 8-5pm est so our schedules don't exactly match...

Do you think this could resolve itself just by updating the NIC drivers? 7.8.16 to 7.8.53...

dajonx · Answer

But that still doesn't explain why I never had this issue when the database was on the PS6000XV yet I have this issue when it's on a supposedly faster PS61000XS...

dajonx · Answer

Thank you.   So it would help separating tempdb into two LUNs...  I wish there was a way to test it before implementing as it would require a good amount of reconfiguration...

dajonx · Answer

Thank you, Don.

Their recommendation is to separate the tempdb data files into another LUN which is on the same SAN. After thinking about it, wouldn't it still have I/O pressure because both LUNs are on the same SAN?

PFTK · Answer

We have been experiencing the EXACT same thing. It has happened on both a virtual and physical systems. What I don't understand is that the dedicated LUN for tempdb is hardly ever used, very low IOPS, but when we do, we get sporadic messages about I/O taking longer then 15 secs.

I have given support SAN HQ logs several times to which they always say that there does not appear to be a LUN/SAN issue. So when the problem started on a virtual, I called VMware support. They took a look at their logs and concluded that the SAN was an issue, at the controllers. Of course EQL disagreed. Then the problem started on an old 2003 based SQL server. EQL again looked at SAN HQ logs, verified best practices, but still stated that they did not see anything out of the ordinary, in fact the engineer said the SAN was performing well. So I have pretty much given up on support with respect to this problem, and I hope someone in the Equallogic community has seen and solved this issue and wouldn't mind sharing the resolution.

I have pretty much re-engineered my vshpere environment, VM, and physicals server to try and address the issue. Other SAN solutions are starting to look very appealing.

Thanks.

dajonx · Answer

Right now, I have several people working on this case at Dell. The last thing they (SAN Engineer) recommended was to disable Delayed ACK and Nagle's Algorithm in the registry for the SAN interfaces which I have done and sent them SAN HQ logs since the change. I haven't heard back.

I have not split tempdb into separate volumes as that was a recommendation by a regular support person; not the SAN Engineer. So, I'm just waiting for them to get back to me.

I'm also convinced that the issue lies with the SAN. I'm not sure if it's because we are using a hybrid PS6100XS and the only RAID configuration is RAID 6 Accelerated or if it's the hardware issue, but I have configured the database and server the same way as I had it before the migration to the new SANs.

Are you also using a hybrid SAN?

PFTK · Answer

I had previously disabled delayed ACK on the ESXi hosts when I was using RDMs. and I have just (today) disabled DelayedACK and Nagle's through the registry on one of the troublesome VMs (using in-guest iSCSI). I'm waiting for off hours to commit changes with a reboot, not sure if cycling the interfaces will be sufficient to reload the TCP/IP stack with the changes.

I am using a hybrid PS6500. I have two members in the this pool, the other being a PS6100 that is not a hybrid. I think due to the small size of the tempdb lun, the PS6500 is the sole owner of the volume. So the other thing I'm trying is pinning the LUN/volume to the PS6100. I also have a SATA based pool that I may considered try as well.

EqualLogic

PS6100XS - SQL Server I/O Slow

Was this post helpful?