Unsolved

This post is more than 5 years old

3 Posts

655

May 7th, 2014 09:00

SQL server recently started experiencing latency issues. Advice?

Background:

I've been investigating a SQL server that has been encountering general slowness and I/O latency issues when performing a daily maintainence job. Their job which usually would take around 1-2 hours has started taking anywhere between 3-14 hours to complete since a week ago. The job performs a checkdb and backup if all is well at the end.

The environment has been set up and around for a while and I've just been brought aboard. So there are a few things about the current configuration that concern me which I will mention as I proceed.

The server is connected to a VPLEX with a VNX backend. The issue is occuring most frequently on 2 LUNs, one contains systemdb and userdb, the other contains tempdb. The VPLEX seems to be working fine and has no issues in health check and backend connectivity. On the VNX I've checked Analyzer and the LUNs in question have response times ranging from 8ms up to 1000+ms with multiple spikes in the 64-500ms range.

Upon looking in to how the LUNs are configured in the pools I noticed a few things that concerned me but I can't pin point as the root cause as of yet. All the LUNs for the server are provided from the same storage pool (I believe the SQL storage best practices mentioned using a different pool for certain DBs). The storage pool has FAST Cache enabled and FAST using a combination of EFD, SAS, NL-SAS. The Auto-Tiering is set to scheduled but the schedule is not enabled under the relocation schedule.

Moving along, the pool and LUNs are in a RAID6 (6+2) configuration (Best practices mentions RAID5 being a better choice). The LUNs are also distributed 100% in the capacity tier (NL-SAS) and the policy is set to Start High then Auto-tier (I think this is either the main cause or major factor). The storage pool tier details show that there is 2550GB to move up from capacity so I feel like it has been a while since the auto-tiering schedule was actually enabled. I'm not sure when the last time relocation was performed and would like to know if there is a way to check that.

I feel like the combination of these issues are what is causing the latency issues on the server, but it's also odd that this has only just become an issue when the environment has been in this configuration for a year or more. I guess what I'm looking for is any advice or suggestions for anything I may have missed. Even someone with excellent knowledge of storage practices with SQL servers to provide some tips so I have a better understanding of all of this in the future.

Thanks!

195 Posts

May 7th, 2014 09:00

I doubt that auto-tiering has been disabled for over a year.  I wonder if there was any activity, like a non-disruptive software upgrade, done on it recently.  I mention that because disabling auto-tiering is part of the prep for some such activities, and it may have been disabled on purpose, but not re-enabled by omission.

I would also ask if the backend LUNs are thin or thick.  Thin LUNs that grow significantly often become dis-organized, and this shows up as decreased performance for sequential access processing...like backups and integrity checks...This can be addressed (among other ways...) by migrating the LUN to a new new LUN, even if the new one is thin and in the same pool.  That process will do something like a re-org/defrag on the data; putting the blocks back in a more sequential manner.

3 Posts

May 13th, 2014 08:00

Thanks for the reply.

So we ended up turning auto-tiering back on and also replaced a switch that was connected to the VPLEX because it had degraded and wasn't functioning properly. The job times were back in their average run-time range of 2 hours for a couple days but today it jumped back up to 12 hours. Also, the backend LUNs are thick provisioned.

I think it may be a san switch issue, not totally sure though.

I did notice that the monitors in VPLEX gui has breaks in them depending on the director/port. Is this normal?

d-2-1-A-vplex.png

d-2-1-B-vplex.png

No Events found!

Top