Unsolved

This post is more than 5 years old

119917

August 4th, 2014 15:00

Compellent performance with iSCSI

Hello all, 

First time poster. We are in a desperate state to figure out performance (latency) issues we are having with our three new (as of 7 months ago) Compellent SANs. They are new, running new code, etc. We've been troubleshooting these issues with Dell for over 6 months, and we are still having latency issues... typical latency is 25ms from the hosts to the controllers.... Ours spikes over 150ms sometimes as well and even more on occasion..... We are running Vmware and SQL. Before we added SQL and it was just VMWare, early on (not that many VMs) it ran very fast. Over time as we’ve added more (about 300 VMs now and not too much SQL) it has degraded severely, so we’ve had to stop migrating virtual servers to Compellent. We have about 900 we want to add to it.

What we cannot get from Dell is a best practices guide "throughout the stack" for Compellent with iSCSI – it doesn’t appear as though such a document exists. It feels like now, 6 months in (and highly escalated) with Dell that we are just grasping at straws, resending log files, "try this"... "now try that", etc. etc. etc.

As you can see, we are frustrated. However, we did finally talk to one other customer today that has 1,400 VMs and they are experiencing expected performance... 1 to 2ms, with spikes to 5ms during backup jobs. They use ESXTOP to see the latency, as we do. It did take them 4 months to get there, though.

This individual agreed to send me their best practices configuration later in the week.... which prompted me to post this post. Surely there are other customers out there that have Compellent with ISCSI with 1,000 or so VMs. If that is you, can you reply to me and see if you can share some of your best practices as well as the process you went through to get your Compellent iSCSI storage environment to perform right?

We’ve tried all of the normal things… flow control, jumbo frames, delayed ACK, Nagels’ algorithm, LUN balancing, etc. but this guy (the one mentioned above) was going into the weeds on all sorts of other things too… things that they came up with on their own through 4 years of prior experience with EqualLogic, and then the first 4 months with Compellent.

Thank you in advance. 

11 Posts

August 5th, 2014 17:00

What's your Xfer latency ? if they are high..it's a problem with the lan, not the compellent. Usually, flow control is the culprit if you're seeing higher than usual latency on iscsi.

Have you tried FC at all ? I'm assuming you're using 10Gb iscsi. Which switches ?

August 5th, 2014 17:00

We have seen sustained latency at 25-30ms, but we know of others that run Compellent at 2ms... sometimes our spikes in the 100's. We have tried flow control off and on... Not tried FC yet. It is 10GB iSCSI, Cisco 5Ks.

118 Posts

August 20th, 2014 20:00

We have seen a lot of problems in the field with people getting their Cisco Nexus configurations correct to use FCoE and iSCSI. I would suggest you start by trying to use another switch - either by splitting your front end temporarily - or if possible, inserting another iSCSI card to test with. The goal would be to isolate the Cisco switches as the cause; once that is done you can get decent support on the issue from the Dell Converged system group and Cisco TAC.

3 Posts

October 13th, 2014 20:00

Can you share configuration recommendations that you have run across regarding Cisco Nexus 5K switches and iSCSI?  We have those switches in use as an isolated iSCSI fabric between our Compellent SAN's and a large vSphere environment (1000+ VMs).  We average 2-5ms latency on our iSCSI Datastores but periodically see 200ms to 4,000ms latency spikes throughout the day when there are tons of overlapping small IO requests.  The huge latency spikes don't correlate to high throughput like during our backup windows (2GB/s).  They tend to appear when hundreds of VM's are performing small IO requests simultaneously.  

At first I assumed paused frames could be causing the latency spikes but the network team confirmed that they did not enable flow control on any of our ports.  They aren't willing to enable flow control unless we can show justification.  Any suggestions would be greatly appreciated!

April 10th, 2015 10:00

Experiencing the exact same issues as you mention. At not one but two of our sites that have Compellent SAN's installed with PowerConnect dedicated iSCSI 8024F switches.

Also tried all of the normal things… flow control, jumbo frames, delayed ACK, Nagels’ algorithm, LUN balancing etc.

Getting Nexus 7K's soon, hopefully that will solve the problem, but yeah, would love a definitive best practice / interoperability guide that actually worked 100% of the time.

June 25th, 2015 10:00

Nexus 7k solved the latency issue at both of our sites.... Using the same config as PowerConnect 8024F....

events found

No Events found!

Top