Start a Conversation

Unsolved

This post is more than 5 years old

4787

December 9th, 2013 20:00

QoS manager (NQM) limits not enforced

I'm running a VNX 5300, 05.32.000.5.201 with iSCSI attached LUNs and I'm not having much luck trying to set a limit on a LUN using QoS manager.

I've used a 'limit' IO class (limit 200 IOPS) on a single LUN, both reads and writes, 'Any' I/O range, pretty simple

The policy is running and it says that it's 'Attaining Goal' which means it's trying to attain the goal but has not yet done so.

While looking at the 'monitor' tab I can see the IOPS drop from 2500 to somewhere around 1000, it never gets close to 200 IOPS.  But in IOMeter where I'm doing my test (32K, 100% read, 100% sequential, 1 oustanding IO, 8GB test file) I see either 2500 IOPS or complete interruptions in I/O for 10-15 seconds at a time due to the limit.

If I stop running the policy the IOPS on the host go back to 2500 IOPS and stay there consistently.

See attached.

Any ideas on this?  It seems like the limits are not working properly.

Ben

1 Attachment

105 Posts

February 10th, 2014 06:00

This is a bug and has been sent to the Unisphere development team (SR: 59665782).

Ben

September 10th, 2014 02:00

Hi,

I have seen same issue on VNX5800 5.32.0.5.209.

I see complete interruptions in I/O for 30 seconds at a time due to the limit.

Would you tell me to fix the issue ?

105 Posts

September 12th, 2014 06:00

The response I got was:

2 Why Qos didn't meet the goal in 'emc qos manager.jpg'?

The most of IO were coming from the target IO class and only a few from the backgroud class. In this situation, it means not much competitive IO requests from backgroud class, so Qos cannot effectively reduce IOPS for target IO class. Furthermore, the goal of dropping from 2500 IOPS to 200 IOPS is too aggressive, so Qos could not effectively meet this goal.

Basically, just as you have seen the we say "this doesn't work" and Engineering says "that's expected behavior".

Ben

105 Posts

January 29th, 2015 07:00

I'm back on this again, now with VNX 7600 (MCx) running almost latest code (.074).

Last time EE said I don't have enough "background" class IO and that QoS would not enforce a limit.  And they stated that going from 2000 IOPS to 200 IOPS is too aggressive (that's just plain silly).

Fast forward 2 years, now I'm on a new VNX and the background class is 30,000-40,000 IOPS, I think that is busy enough.  I have some noisy neighbor LUNs that are using up a disproportionate amount of bandwidth & CPU on our VNX.  Need to limit those few luns, everything else is doing just fine.

You can see that when QoS is applied the application (IOmeter; 1 Outstanding IO; 4K, 100% Read; Sequential; 300 second run time) goes between 100% and 0%.  It will stay at 0% for 10-20 seconds at a time (maybe more).  Needless to say nobody in a production world would accept this as a valid QoS implementation.  I've opened another SR and will see where I get.  Not looking good.....

QOS-IOMeter-Perfmon.png

Qos-Background-class.png

105 Posts

January 29th, 2015 09:00

Here is the IO before and after implementing QoS limits:

QOS-before and after IOPS.png

105 Posts

January 29th, 2015 09:00

And to top things off when I had the policy enabled with a single LUN in the IO class my entire VNX 7600 slowed to a crawl.  See the attached latency.  We saw this across all our ESX hosts and physical LUNs.

I was testing at 11AM and from 11:25-11:50AM this morning.  Pretty sure the 3,000 IOPS that IOMeter was generating did not cause this by itself.  The array normally does 30-40K iops.

2015-01-29 12_26_39-Qos-causing-latencyonSAN.png  -  FastStone Image Viewer 4.9.png

105 Posts

February 11th, 2015 11:00

The latest on this is that there will be a patch for MCx in Q1/Q2 2015.  Word has it that both the drop-outs and the array wide latency will be fixed.

Ben

April 16th, 2015 15:00

Ben,

Could you share an SR# or breakfix identifier so that I (and others) can track when this fix becomes available?

Thanks,

Eric

105 Posts

April 17th, 2015 09:00

SR 68909542. 

I saw some QoS/NQM fixes in the latest VNX 1 code but have not seen it yet in the MCX code on VNX2.

"Hi Ben,

Engineering have advised that they are aiming to include the fix in the next release of R32 due out mid March, however this may be pushed out to the following release. Please let me know if you have any further questions on this"

Ben

2 Posts

July 27th, 2015 04:00

Did anything further come of this? We're in the same situation - we'd like to use QOS on our VNX7600 but see these same periods of complete drop-out of IO.

Thanks

105 Posts

July 27th, 2015 10:00

I just checked the .102 (cumulative) release notes and they have no fixes or caveats related to this issue.  I've heard from my SE that "nobody uses the QoS feature".... probably because it breaks the entire array...

Options could include looking at a 3rd party array where this actually works.  Something like a 3Par 7400 would be a good start.

Ben

4.5K Posts

July 27th, 2015 12:00

If you're seeing issues with QoS - I'd recommend you open a case with EMC - you should be able to get the latest information about any updates or fixes for what you're seeing. If you open a case, you should include the current spcollects, NAR files for checking the performance on the array and the NQM logs - make sure you specify that this is a QoS issue.

glen

105 Posts

July 27th, 2015 13:00

Hi Glen,

I opened SR 68909542 on this exact issue several months back.  Status was that it "might be included in a future release and don't use QoS in the meantime".

Ben

4.5K Posts

July 27th, 2015 13:00

Thanks Ben - I have been following that case.

I was asking ed.tanner to open a case for his issue also - just to let engineering know that there are others with this same issue.

glen

2 Posts

July 28th, 2015 01:00

I've got a case open with EMC so will see what they say but aren't expecting much. its a real shame as QOS is very important to us and it looks like EMC just can't provide it

No Events found!

Top