danLE

50 Posts

2906

October 10th, 2011 12:00

High latencies since we've activated FAST II

Hi there,

since we've activated FAST II on our CX4-960 we see very high latencies (1200msec) for a few hours a day. It's always the same time when latencies for all connected hosts is being pushed to numbers above 1200msec. We can't find the root cause for that.

It all started after migration the first LUNs from traditional RAID groups to pools and it affects every host connected to the array.

Did anyone experience a similar behaviour?

Any help is appreciated,

Regards,

daniel

Responses(14)

dynamox

2 Intern

•

20.4K Posts

0

October 10th, 2011 12:00

have you looked in analyzer and see what's going on around that time ? It could be completly unrelated to FAST.

D

danLE

50 Posts

0

October 10th, 2011 12:00

no.

relocation runs during nightly hours and the high latencies are reported around noon.

dynamox

2 Intern

•

20.4K Posts

0

October 10th, 2011 12:00

it's not around the time your relocation runs ? I always see service times and CPU utilization go up when relocation runs.

D

danLE

50 Posts

0

October 10th, 2011 13:00

yes we did.

created a heat map with the support. it shows for one day one traditional RAID group being overutilized by one LUN - but not high load on both SPs and ~20% free cache on both SPs. on the other day there was no load on this LUN but latencies rised at exactly the same time again.

how can we determine when the internal rollup jobs for statistics collection for autotiering is running?

kelleg

4.5K Posts

0

October 18th, 2011 14:00

check the version of flare you run - if it is release 30 patch 512 or lower, please upgrade to latest flare patch 522. There are some fixes for FAST cache enabled on auto-tiered Pools that will help. Contact service to schedule a flare upgrade for this and mention the 1200ms response times.

glen

D

danLE

50 Posts

0

October 18th, 2011 22:00

Hi Glen,

FLARE was updgraded yesterday to .522

Let's see if this will solve our problem.

Regards,

Daniel

kelleg

4.5K Posts

0

October 19th, 2011 14:00

Just ab FYI - I worked a case where the workload was in constant flux for about two months - adding, removing, changing, etc. During this time performance was not real good. Once the configuration was stable, it took about 3-4 weeks and then performance improved until they hit a stable point and performance remained constant from that point on. So it may tak a while to settle down. Watch the amount of data relocations each day in the Tiering section - you should see alot to to move at first, then as time passes, the amount of data to move up and down starts to decrease. If you're seeing TB's to move up and down, you're changing too much.

Also, be careful with backups running at the same time you have tiering scheduled.

glen

Letbetterp

196 Posts

0

October 19th, 2011 17:00

Thanks

I never even thought about auto tiering and backups running at the same time.

SteveZhou

136 Posts

0

October 19th, 2011 21:00

For me, the troubleshooting clue would be :

(1) Enable data logging to monitor the box since it would record any possible perf issue. You may take a look at NAR to match the exact time frame that the issue happened. In case the issue could be not related to FAST VP.

(2)check FAST relocation running status to determine if it were running during issue time.

Regards

-Steve

mugurstef

25 Posts

0

October 21st, 2011 07:00

Hi, Any response to Flare upgrade? I'm in the same situation. Thank you,

D

danLE

50 Posts

0

October 23rd, 2011 08:00

status so far:

- backup and auto tiering does (and did) not run at the same time

- FLARE has been upgraded last Thuesday to patch level 522

- that brought us only very small improvements since problems seem to be caused by high load on mirrored volumes

- high latencies are observed not in the timeframe where relocation runs

Are there any Mirror View issues know we have to be aware of?

We started moving to a pool concept within one array. That brought us to the situation where we needed to extend the pools with disks which have been freed by migration their LUNs to the pools etc. Without a relocation or restriping process were are stuck in some LUNs located only on the very small number of fresh disks added to the pool shortly before the LUN has been migrated to the pool...

In our opinion the only way would bee to migrate all data off the array and to build pools with their final capacities and then move back all data to the array. Does anyone would like to share his thoughts on that problem?

Best regards,

daniel

SteveZhou

136 Posts

0

October 23rd, 2011 23:00

There is a known perf issue for MV/S with FAST CACHE enabled. You may take a look at emc266584.

-Steve

D

danLE

50 Posts

0

October 23rd, 2011 23:00

We do not have FAST Cache enabled. How can I access emc266584?

mugurstef

25 Posts

0

October 24th, 2011 00:00

Search "emc266584" on "Search Support" on Powerlink.

View All

No Events found!

CLARiiON

High latencies since we've activated FAST II