1 Rookie

 • 

85 Posts

January 25th, 2013 01:00

Hi RRR,

Most of the cases the VNX port queue depth of 1600 is not a bottleneck. If you run ALUA across 4 paths and distribute luns across both SP's you will effectively have a max queue depth for your ESX cluster(s) of 6400. In personal I don't see this as a bottleneck in most of my large customerconfigs.

However there is another issue with QueueFull condition in combination with lun-queuefull. The queue depth is limited per lun. This means that a certain queue on a single lun can cause a QFull to be reported to the host. My experience is that server QFull conditions are mostly caused due to lun queue-full conditions, and not to port queue fulls.

The lun queue full conditions will occur quite fast. With Flare 31 and R5 41 pools the lun queue full will be triggered with 88 IO's in the queue.  So even with a pool of 100 drives the max queue depth on a lun on this pool will be 88. With Flare32 this number increases, depending on the number of datadrives in the pool. For example with 20 drives in a pool this number raises to 224 IOPS. See also emc204523. Especially with a low number of large luns in a VMFS config you can hit this limit relatively quicker than the port queue limit.

And finally what are we doing with VMware? I allways make customers aware of the queue full possibilities. But lowering the HBA-queue depth settings for each environment I usually don't recommend. What should you set? In my opinion it's better to monitor QFull conditions and take actions on specific severs/environments if needed.

In ESX3.5 adaptive queuedepth throttling is introduced. In my opinion this is a better approach than lowering HBA queue depths globally. In ESX5i it's even possible to configure adaptive queue depth on an individual disk basis.

So to summarize my opinion:

    • Monitor for Queue Full conditions
    • Be aware of the lun queue limits
    • In ESX environments use adaptive queue depth rather than limiting all HBA instances.

4 Operator

 • 

5.7K Posts

January 25th, 2013 11:00

Thanks for explaning this. I didn't know these things about VMware

9 Legend

 • 

20.4K Posts

January 28th, 2013 17:00

can i see LUN queue in Analyzer ?

4 Operator

 • 

4.5K Posts

January 29th, 2013 12:00

In Analyzer you can look at the Queue Length or the Average Busy Queue Length (better as this shows what the queue is when the object is busy). This is the Queue that is referred to above (14 * disks) +32  --  ex. R5 <4+1> would be (14 * 4) + 32 = 88.

glen

9 Legend

 • 

20.4K Posts

January 29th, 2013 13:00

thank you, i never understood what optimal and nonoptimal next to the counter mean ?

4 Operator

 • 

4.5K Posts

January 29th, 2013 14:00

That's supposed to be when the access to the data is using the CMI bus - like with a ALUA trespass - it's not working yet - optimal path (direct path), non-optimal path (ALUA using CMI bus).

glen

4 Operator

 • 

5.7K Posts

January 29th, 2013 23:00

Wow, it hardly happens that you don't know some specific thing

4 Operator

 • 

5.7K Posts

January 29th, 2013 23:00

Thanks. I knew the queue length for a LUN was 88, but I never knew a LUN queue FULL was an actual status as well.

9 Legend

 • 

20.4K Posts

January 30th, 2013 03:00

RRR wrote:

Wow, it hardly happens that you don't know some specific thing

happens all the time

4 Operator

 • 

5.7K Posts

February 4th, 2013 13:00

That can only mean 1 thing: you’re not a bot after all

11 Posts

March 29th, 2014 02:00

Hi all! And what about lun queue length at MCx?

4 Operator

 • 

5.7K Posts

March 31st, 2014 05:00

Hello alkhvo: as far as I know it remains the same with MCx.

No Events found!

Top