Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

846

September 16th, 2014 03:00

VMAX20K - Clone operation causing 80% backend utilization and latency

Hello,

In the past month every time we run a clone operation on one of our devices it cause the vmax back-end to ascend 80% utilization.

When clones are not running the back-end utilization is at the average of 17-22%.

some info:

VMAX20K - 4 Engines.

The production thin devices are divided almost equally in performance (IO/ps) across two FC15K pools 144 disks 600GB each pool 78TB(x2).

The clone devices are spread across FC10K pool 144 disks 600GB 78TB.

the FA are not ultimately balanced, but even the highest pick of utilization isn't crossing 38%, while the clones are running there is no change in utilization.

while running the clone, it seems like some of the FAs have average read latency above 200ms, while write is between 30-80ms.

this symptom is not similar to all the FAs, 10E,7E,8E,9E seem to "suffer" the most, others seem to have impact on reads latency but not as extreme.

It seems like there is a direct effect on our production systems.

Each time we run a clone operation - both 15K FC pools suffer from higher latency than usual no matter on which pool the source lun is bound, even when running 1-2 insignificant clones that are not mapped to any FA (no matter which meta configuration these devices have).

Although we have an open case in EMC, I wanted to hear more opinions and suggestions.

Sincerely,

Guy

1.3K Posts

September 16th, 2014 04:00

What Enginuity level are you running?

Do you know about clone copy QoS?

5 Practitioner

 • 

274.2K Posts

September 16th, 2014 05:00

Hi Quincy,

I've stated the enginuity in the tags: 5876


Yes, we tried using it and we have been told QoS only works when using precopy, and once we activate the device it's not being considered anymore, which means I can't activate my clones immediately.

While using QoS, we can "dodge the bullet" barely, we still have some impact on production applications once we activate.

Now, Let's sort of "ignore" your second question, and ask is it alright to reach 80% backend utilization from 20% when running 1TB full clone?

1.3K Posts

September 16th, 2014 05:00

What are you using the clones for?  If you activate immediately, maybe you shouldn't copy at all.

Once you activate, any writes on the source must be copied to the target.  So it isn't the copy that is causing all the workload, it is probably copy-on-write workload.  That is why we make pre-copy, so that we can avoid a lot of the copy on write impact once activated.

1.3K Posts

September 16th, 2014 07:00

The copy on write will need to perform reads on the source disks, and writes to the target disks.  This will create load on DAs and other resources in the system as well.

I didn't see an answer to my question.  What are you using the target devices for?  In most cases if you want an immediate copy, then a no copy target (even on read from target) may be the best option.  The default behavior is copy on write to source or target or read from target.  Also the device will try to copy those tracks you don't access.

You can't avoid the copy on write impact without pre-copy, but you can eliminate the impact for copy on read from target.  Also you can avoid the extra work of the copying the tracks that are not even accessed. 

5 Practitioner

 • 

274.2K Posts

September 16th, 2014 07:00

Ok, so the copy on write workload is slowing the source lun right?

Why all of my other production applications and storage gets slow too?

Why even when cloning 1TB device which is not mapped to any FA the other devices suffer from higher latency? and why my DAs utilization ascending to 80% over 1 device cloning?


No Events found!

Top