high back end disk utilization after 5875.249.188 installation

Question

Has anyone else seen this behavior? We use FAST VP, and after updating to 5875.249.188 on 2 different VMax arrays, we saw a huge increase in back end disk activity, with no corresponding increase in front end activity.  This happened exactly at the same time as the code update in both situations.  After several days, the activity trails off and goes back to normal. As an example, we saw disk groups that had an average of around 20-30% busy go to 60 to 80% busy just about instantly after the code update. back end response time is affected... we still get reasonable response times, but it is a good 30-40% slower. EMC engineering seems a bit confused by all this... we've gotten a variety of different explanations, but they seem to think it is related to changes in FAST VP optimization code.  In the end, things return to normal on their own, but its not the sort of thing that you can just ignore and not research when you see something unexpected like that after a code update.

afp92Tq1w012558 · Answer

Hi , Just out of curiosity, what code were you running earlier? Vanitha

AndyWOhio · Answer

We were running 5875.198.148

Quincy561 · Answer

There was a change to the IVTOC destage behavior.  I would have the CS folks check for IVTOC tracks. If it is this, it will go away after they are destaged. John

sauravrohilla · Answer

Did you notice the EFD tier drain to lower tier immediately after the upgrade?

johncampbell1 · Answer

Andy

We had a similar experience on our two VMAXs here... after going up to 5875.231.172 we noticed a drainage of data off the EFD pool and very high back end activity.

We've just recently gone to 5875.249.188 (plus sundry patches) in the expectation that our single policy FAST VP approach will return to its behaviour pre 5875.231.172, but the new microcode has not yet been in place for 24 hours.

The back end disk activity is in the red across the whole array (via SPA).

I'm concerned that our current SPA 2.2.1.4 is most appropriate to monitoring FAST VP usage and would be interested to hear of other users processes for doing this.

Thanks

John

rkaeser · Answer

We had the same issues, our EFD pool drained all the way to SATA. The lab dialed in and edited a few Optimizer files to get the data moving out of Sata back to EFD. After that we found that nothing was moving into the Fibre Channel pool. We are waiting on a Special build to correct that issue also.

AndyWOhio · Answer

FYI, SPA 2.2.1.4 combined with enginuity 5875.249.188 has some bugs with FAST VP reporting, basically, any numbers where the SPA breaks out IOs by tier is suspect.

Take a look at your total host IOs on your VMax via the "trend" view. You can get that by clicking on the "FE Directors" folder in the stats. On my system, I average in the 20-23K range on a typical day. When I go to the "pools / TP Data pools" folder in trend view and click on the top level icon to show the stats for all thin pools combined, I see a total host IOs of 40-50k.

The SPA is seriously double counting the data.

Another area in trend where the stats are bogus is under masking views. Click on the folder for a masking view, and look at the total "IOs per sec - storage group" graph and observe the number. Then expand the masking view tree, and click on the storage group itself, so that SPA shows a graph of IOs per tier. The numbers don't come close to adding up to the numbers reported in the first "ios per sec - storage group" graph I mentioned.

sauravrohilla · Answer

Hi Andy,

Yes you are correct that's the behaviour in Spa. It does show duplicate IOPS when you compare it with overall vs by tiers. The problem is microcode does not give much granular stats when it comes to Tdev under fast vp. So if a tdev is spread across 3 tiers and doing some X amount of IOPS to one tier, microcode represents it as Tdev is doing the same X amount of IOps in all three tiers thus making it double or sometimes triple.

We have already identified this problem and work is being done to fix this future releases...

Regards,

Saurabh

johncampbell1 · Answer

Update on our specific issues with FAST VP decline following updates to 5875.231.172 (in March) and recently to 5875.249.188 .... we are still not back to our performance pre 5875.231.172.... this is because our site specific set of epacks - mostly consisting of FAST VP patches was erroneously not applied when 5875.249.188 was installed - we were given a generic epack set instead, which, although containing some FAST VP related material, was not site specific.

Our site-specific patches are due to be applied on our first VMAX this evening.

This has highlighted our vulnerability in relying on FAST VP - we use a simple single policy for all our storage groups 100/100/100 for each of the three tiers - this approach had appeared to be vindicated until the advent of 5875.231.172.

We had asked for a list of numbered epack fix references as part of the enquiry around this issue and it has raised in my mind the degree to which other vmax administrators get involved in the details of patch management in advance of any planned changed - ie knowing exactly what is being applied and why. Does symcli afford this?

dynamox · Answer

honestly i don't keep up with epatches as close maybe as i should. Did you want to list all patches ? symcfg list -sid 123 -upatches

dynamox · Answer

i agree, anytime we upgrade our boxes and we know that we had some 'custom' epacks or we had the lab 'zap' something ..we always double check that the new and approved epack does not negate or brake what was done previously. This should not be my job but EMC !!

johncampbell1 · Answer

Dynamox,

Probably wouldn't have been interested in the past.... the issue here is to assist in validating actions taken against actions prescribed in terms of remote o/s patches / upgrades.

Simply, having access to review a list of numbered patches prior to an upgrade and check that they're in place once the upgrade has taken place ... a la use of pkginfo | showrev for example in the solaris unix world.

More generally, up to 5875.231.172 we may have regarded the minutae of remote upgrades as not "our business" - i.e a process that was wholly owned by EMC. The upgrade to 5875.231.172 appeared to cause significant upset to FAST VP - and still is - and has necessarily altered our level of scrutiny of the upgrade process.

VMAX

Was this post helpful?