Start a Conversation

Unsolved

This post is more than 5 years old

D

2 Intern

 • 

294 Posts

29392

February 13th, 2012 06:00

SQL Server 2008 R2 - Lost Volume Due To I/O Bottleneck

Hi,

Over the weekend, I was updating statistics on SQL Server and the table happened to be fairly large (66.2 GB Data and 4.8 GB Index).  All of  sudden, the tempDB volume disappeared which caused chaos for SQL Server. 

In the SQL Server logs, there were three entries stating "SQL Server has encountered 2200 occurrences of I/O requests taking longer than 15 seconds to complete on file ."

Shortly after, SQL Server logs had numerous entries stating "BobMgr::GetBuf: Sort Big Output Buffer write not complete after 60 seconds" and "120 seconds".

Then the volume disappeared because SQL Server was shooting messages stating that the tempDB files cannot be reached and SQL Server and SQL Agent services went down. 

The volume has more than enough space on it for tempDB to grow automatically. 

I'm wondering why the volume just went down due to an I/O bottleneck.  What can I do from the SAN standpoint to remedy this (as this is a SAN problem)? 

Thank you.

2 Intern

 • 

294 Posts

February 13th, 2012 06:00

Good morning Don,

The firmware is 5.1.2.  The Dell switches are two 6224's stacked.  We have two arrays (PS6000E and PS6000VX) and we have tempDB residing on the PS6000VX (RAID 10).  

5 Practitioner

 • 

274.2K Posts

February 13th, 2012 06:00

On general principle, the arrays should be upgraded to 5.2.1.   Firmware / configuration on the 6224's needs to be carefully checked.  Older FW had some issues that could result in poor network performance.  Do you know what version of FW is on the switches?  Make sure if you are using Jumbo Frames that the array / server ports are NOT on VLAN1,  flowcontrol needs to be enabled and spanning tree portfast on the array/server ports also needs to be set.

5 Practitioner

 • 

274.2K Posts

February 13th, 2012 06:00

Good morning,

There's not much detail here to make a suggestion.  Typically such occurrences are network related.   What version of FW is on the array?   What kind of switches?   Stacked, trunked?  

2 Intern

 • 

294 Posts

February 13th, 2012 07:00

I will verify the configs of the switches and am planning on upgrading the firmware to 5.2.1 soon.  How does this alleviate the I/O bottleneck as seen in the logs as "I/O requests taking longer than 15 seconds"?

5 Practitioner

 • 

274.2K Posts

February 13th, 2012 07:00

Because we haven't established where the bottleneck exists.  Making sure that the network is correct is a common first step to finding such problems.   The logs can only see the SCSI side of the equation.  "something" is delaying the I/O.  That's as far as it can determine.  If the network is causing the latency, the switch is a common cause.  If the array is overloaded or not at the optimal RAID level for your IO, only a review of a SANHQ archive and array diags will determine that.  I'm trying to provide some common fixes based on your symptoms.

5 Practitioner

 • 

274.2K Posts

February 13th, 2012 09:00

Also forgot to add, a "lost" volume typically means that it lost connection to the array.  Most common cause of that is the network.   Diags from the array will also show why, if any connections were dropped.

So ultimately, you might have to open a support case to triage this further.

Regards,

2 Intern

 • 

294 Posts

February 13th, 2012 10:00

Ok thank you.  I'll open a support case.

7 Technologist

 • 

729 Posts

February 13th, 2012 10:00

If you are clicking “customer support” from within the Group Administrator GUI “Tools” menu, then you may not have a management network setup, and/or the array doesn’t have access the internet (since the link is to a URL outside of the SAN network).

To contact support either do one of the following, login to the support site, or call them directly.

To login to the support site, please use this URL: support.equallogic.com (valid support usename and pw required).

The phone number for support is on the same page.

-joe

2 Intern

 • 

294 Posts

February 13th, 2012 10:00

I am receiving an internal server error when trying to open a case...

5 Practitioner

 • 

274.2K Posts

February 13th, 2012 10:00

Can you try again?   I just created a test case w/o a problem.  

2 Intern

 • 

294 Posts

February 13th, 2012 10:00

I have logged in to the support.equallogic.com and clicked on Log A Case.  I then filled out the necessary information and clicked on Submit Case.  I still receive the message "The page cannot be displayed because an internal server error has occurred."

No Events found!

Top