odurasler

103 Posts

2617

August 23rd, 2011 09:00

High Response Time in SQL Server

I'm a little puzzle at this...and I was hoping that you guys could provide guidance.

I have two SQL servers at play here. Only one is attached to a CX4-480 (I'll call this Server#2). The other is a standalone (Server#1).

At approximately 20:00, Server#1 reads a SQL DB on Server#2 via a SQL agent job(i.e. SSIS program). Server#1 comes up with an error indicating long IO wait. Below is the error description taken from SQL:

Mon 2011-08-22 20:00:30.56 spid3s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file in database [MSCRM] (25). The OS file handle is 0x0000000000000938. The offset of the latest long I/O is: 0x00003997790000

Mon 2011-08-22 20:00:30.57 spid3s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file in database [ODS$Snapshot_MSCRM] (26). The OS file handle is 0x0000000000001408. The offset of the latest long I/O is: 0x000011a7762000

(FYI...The F:\ drive belongs to Server#2)

So I went into Analyzer during that time frame and noticed High Response Time. It's around 70ms (very high for a SQL server). Service Time was also high

I looked at Total Throughput, which was around 70 IOPs...nothing happening here

Utilization is around the 50-60 range...nothing happening here as well.

SP's are in the normal range, low utilization.

No forced flushing occurring. I have Fast Cache enabled as well.

This LUN sits in a storage pool consisting of 5 EFDs and 15 FC. IOPs is no problem for this LUN

I also checked other pool LUNs in this storage pool. They were all low utilization, throughput, etc.

Tiering wasn't happening during that time...so no moving parts.

As you can see, i'm a little puzzled. Why would i get high response time when there's really low activity in on that LUN and even on that pool? What am i missing? what other thing haven't I looked at?

Message was edited by: jjimenez

Responses(11)

dynamox

1 Rookie

•

20.4K Posts

0

August 23rd, 2011 09:00

so the errors you mentioned, they appear on server#1 ? . Drive letter F: is the mapped drive letter ?

odurasler

103 Posts

0

August 23rd, 2011 09:00

Server#1 is not attached to the SAN. Yes, the F: drive is shared on Server#2, and Server#1 connects to it to read the data.

dynamox

1 Rookie

•

20.4K Posts

0

August 23rd, 2011 09:00

So server#1 is not SAN attached at all ? Are you sharing F: drive on server#2 and server#1 connects to it via mapped network drive/UNC ? I am confused …

odurasler

103 Posts

0

August 23rd, 2011 11:00

I've edited my first post. I got more clarification from my sql guy.

Basically, Server#1 runs a sql job that reads a database on Server#2 that reside on the F: drive of Server#2. While running this job on Server#1, a sql error appears on Server#1 indicating that it has to wait for a long time (this is depicted by the error that i have included on my first post). So no, server#1 is not using UNC to read the database but uses port 1443 to read the data.

Whether it reads the data via UNC or through other ports...i still can't figure out why there is high response time/service time for that LUN when there is low IO going on.

hope that clarifies things.

dynamox

1 Rookie

•

20.4K Posts

0

August 23rd, 2011 11:00

so server#1 is pulling data from server#2 over the network, since it's server#1 complaining about performance and not server#2 ...does it point to network throughput issue between the two servers ? It's not pulling data fast enough ?

odurasler

103 Posts

0

August 23rd, 2011 13:00

That may be the case, and I see where you are heading with this. Although I haven't looked at the network as being the culprit, I wanted to start with the array first and rule it out. However, after looking at analyzer, I couldn't understand how the response time was so high when the server had low IOs.

dynamox

1 Rookie

•

20.4K Posts

0

August 23rd, 2011 13:00

it would be interesting what you see on server#2 from OS perspective. Can you pull-up perfmon and add these counters:

PhysicalDisk \ Avg. Disk Sec/ Read - (read response time )

PhysicalDisk \ Avg. Disk Sec/ Write - (write response time)

PhysicalDisk \Disk Reads/sec (read IOPS)

RRR

1 Rookie

•

5.7K Posts

0

August 24th, 2011 05:00

It's always the network

dynamox

1 Rookie

•

20.4K Posts

0

August 24th, 2011 19:00

RRR wrote:
It's always the network

hahah..network guys were always an easy target but now that i am doing FCOE ..i am one of the network guys, so who should i blame now ? Ohh..how about DBAs

RRR

1 Rookie

•

5.7K Posts

0

August 26th, 2011 06:00

Hahaha, nope… it’s still rule #1: the network. But more specifically, the LAN (IP, DNS, ARP, routing, firewalls) !

kelleg

4.5K Posts

0

August 29th, 2011 09:00

With FCoE it's always the cables

glen

View All

No Events found!

CLARiiON

High Response Time in SQL Server