dajonx
2 Iron

SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Hi,

Over the weekend, I was updating statistics on SQL Server and the table happened to be fairly large (66.2 GB Data and 4.8 GB Index).  All of  sudden, the tempDB volume disappeared which caused chaos for SQL Server. 

In the SQL Server logs, there were three entries stating "SQL Server has encountered 2200 occurrences of I/O requests taking longer than 15 seconds to complete on file <TempDB data files>."

Shortly after, SQL Server logs had numerous entries stating "BobMgr::GetBuf: Sort Big Output Buffer write not complete after 60 seconds" and "120 seconds".

Then the volume disappeared because SQL Server was shooting messages stating that the tempDB files cannot be reached and SQL Server and SQL Agent services went down. 

The volume has more than enough space on it for tempDB to grow automatically. 

I'm wondering why the volume just went down due to an I/O bottleneck.  What can I do from the SAN standpoint to remedy this (as this is a SAN problem)? 

Thank you.

0 Kudos
11 Replies

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Good morning,

There's not much detail here to make a suggestion.  Typically such occurrences are network related.   What version of FW is on the array?   What kind of switches?   Stacked, trunked?  

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos
dajonx
2 Iron

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Good morning Don,

The firmware is 5.1.2.  The Dell switches are two 6224's stacked.  We have two arrays (PS6000E and PS6000VX) and we have tempDB residing on the PS6000VX (RAID 10).  

0 Kudos

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

On general principle, the arrays should be upgraded to 5.2.1.   Firmware / configuration on the 6224's needs to be carefully checked.  Older FW had some issues that could result in poor network performance.  Do you know what version of FW is on the switches?  Make sure if you are using Jumbo Frames that the array / server ports are NOT on VLAN1,  flowcontrol needs to be enabled and spanning tree portfast on the array/server ports also needs to be set.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos
dajonx
2 Iron

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

I will verify the configs of the switches and am planning on upgrading the firmware to 5.2.1 soon.  How does this alleviate the I/O bottleneck as seen in the logs as "I/O requests taking longer than 15 seconds"?

0 Kudos

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Because we haven't established where the bottleneck exists.  Making sure that the network is correct is a common first step to finding such problems.   The logs can only see the SCSI side of the equation.  "something" is delaying the I/O.  That's as far as it can determine.  If the network is causing the latency, the switch is a common cause.  If the array is overloaded or not at the optimal RAID level for your IO, only a review of a SANHQ archive and array diags will determine that.  I'm trying to provide some common fixes based on your symptoms.

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Also forgot to add, a "lost" volume typically means that it lost connection to the array.  Most common cause of that is the network.   Diags from the array will also show why, if any connections were dropped.

So ultimately, you might have to open a support case to triage this further.

Regards,

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos
dajonx
2 Iron

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Ok thank you.  I'll open a support case.

0 Kudos
dajonx
2 Iron

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

I am receiving an internal server error when trying to open a case...

0 Kudos

Re: SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Can you try again?   I just created a test case w/o a problem.  

Social Media and Community Professional
#IWork4Dell
Get Support on Twitter - @dellcarespro

0 Kudos