Start a Conversation

Unsolved

This post is more than 5 years old

1 Rookie

 • 

5.7K Posts

4307

January 4th, 2008 00:00

What's the impact on existing SRDF/S when new adaptive copy starts running

I was under the assumption that resuming an acp_disk synchronisation was a low priority SRDF "task" and that existing SRDF/S data streams were not impacted. I thought that the adaptive copy process would only use the "free" bandwidth so the "production bandwidth" would not be impacted.

Every day we're copying from 300 to 500 gigs of data using acp_disk and let it run for almost 2 hours. After this we switch to SRDF/A to get the data consistent and after a minute or 2 or 3 we do a suspend.

Everything worked just fine, but yesterday an SRDF/S customer complained that during this 2 hour period he experiences extreme slowness on his SAN attached disks. Considering the history of this customer and previous incidents, I'm pretty sure it has to do with our acp_disk sync which occurs every day.

What I would like to know is if we can change the priority of this acp_disk sync or at least explain to me why EMC says that adaptive copy is a low priority thing, while in fact other synchronous links ARE stil impacted.

March 1st, 2008 19:00

You should definately only run it when in ACP_DISK.

1 Rookie

 • 

5.7K Posts

January 4th, 2008 08:00

Hey ! If that is the case we have an issue here !!!
I was always told that for an initial sync you should always use adaptive copy, but if you're using a somewhat larger device or meta, the number of tracks that need to be copied exceeds the max nuber of invalid tracks (skew value ?) and thus everything turns to async, sync or semi-sync ? I can't believe that is actually what happens. This would mean that every initial pair create would start running as (a/semi-)sync.

But you triggered me. I will start digging deeper now.

Anyone else any ideas ?

1 Rookie

 • 

5.7K Posts

January 4th, 2008 08:00

I will.... Monday.
Thank you so far :-)

1 Rookie

 • 

20.4K Posts

January 4th, 2008 08:00

have you tried increasing the skew on acp_disk ?

1 Rookie

 • 

20.4K Posts

January 4th, 2008 08:00

i might be interpreting this incorrectly ..check out page 90 in "Solution Enabler Symmetrix SRDF Family CLI Product GUIDE 6.4"

1 Rookie

 • 

5.7K Posts

January 4th, 2008 08:00

It's set to maximum 65535....
Do you know what the impact is on existing SRDF/S when adaptive copy is started or is running ? It should be low impact, but I have my doubts.

1 Rookie

 • 

20.4K Posts

January 4th, 2008 08:00

unfortunately i don't have a lot of experience with adaptive copy modes. Looking at the doc ..it says that when the skew number is exceeded it will turn either synch or semi-synch on until the number of invalid tracks is back below the skew number. Could that be the case in your situation ?

131 Posts

January 4th, 2008 09:00

An interesting question! Adaptive copy mode does have a lower priority than synchronous SRDF, but there are other factors to consider too. I'm not a Symmetrix performance guru (and I do suggest you open an SR with your local support to get a performance specialist engaged).

Here's a few things to think about in the meantime:

The max ACp skew (65535) is interpreted as infinity. In other words, if its set at this (default) value, the RDF mode stays in adaptive copy.

Are your SRDF links heavity utilised with SRDF/S traffic, such that the additional ACp traffic now completely saturates the link? This will cause everything to slow down.

The additional backend (DA + disk) activity imposed by the sync could be causing increased service times for some hosts - especially if the volumes they're accessing reside on the same spindles as the volumes being synched (remember there's only one set of drive heads!).

RAID mode can have an impact on the above - with mirrored volumes there are obviously 2 copies of the data - either of which can service a read. With RAID5 there is only one copy, so read misses have to come from that one device - potentially causing hot spindles.

Cache Pressure - If the cache is very full of data waiting to be destaged (dirty), RDF writes to volumes in ACp_disk mode can be discarded once the local write(s) have completed. The remote track gets marked as invalid. This allows a cache slot to be reused quickly without having to wait for RDF. However, the track will eventually need to be copied over the RDF link, and the penalty for the quick win is having to do another local read from disk into cache in order to then send the track to the remote box. This results in higher backend I/O activity.

RDF group utilisation. Are there multiple RDF groups per RA? How is the workload balanced between them?

I believe the RA will "round-robin" if there are multiple RDF groups on it. Therefore it may be possible that a low priority ACp sync in an otherwise idle group could be disproportionately taking cycles from busy SRDF/S sessions in other groups on the same RA.

The answer? Lots of analysis to determine where the perceived performance problem lies. For a quick fix, I would start by investigating the symqos (Quality of Service) Solutions Enabler commands, with which you can set the SRDF copy pace. For example:
# symqos -g drdf set RDF pace 10

...sets the RDF copy pace of all the devices in group "drdf" to 10, which is the slowest. You'll need to tune the pace setting, but this may allow you to slow down the ACp sync such that it does not impact the SRDF/S.

Hope that's of some help!
Best Regards, Marc

2.8K Posts

February 18th, 2008 02:00

Rob .. can we help further on this topic ?? :D

2.8K Posts

February 18th, 2008 02:00

Rob you have the right to say that you worked around the problem but that you still have a problem (even if worked around). I think that MarcT's suggestion to open a SR is the only serious option here since digging performance problems via the forum may be very dangerous .. It's better to reproduce -if possible- the issue and dig into the problem opening a SR.

1 Rookie

 • 

5.7K Posts

February 18th, 2008 02:00

I quote: "The skew value is configured at the device level and may be set to a value between 0 and 65,534 tracks. For devices larger than a 2 GB capacity drive, a value of 65,535 can be specified to target all the tracks of any given drive."
So setting the skew to the maximum value, it should never switch to SRDF/S or /A, but still, when my 200 LUN's (single symdevs as well as metas) start "adaptive copying", our SRDF/S is impacted and we received incidents from customers, saying that their performance was severily impacted.

I regret to say that my question still isn't answered. I must also add to this issue that we created a workaround for this problem: we started doing these ACP_Disk syncs more than once a day instead of once a day and (for now) nobody complaned.

Although the question hasn't really been answered, I'd like to close this question / topic, since it no longer applies to a live problem we were facing.

1 Rookie

 • 

5.7K Posts

February 18th, 2008 03:00

The thing is that ATM we have other priorities and don't have the time to look too close at this issue anymore. Of course you are right and we certainly will open a case when the problems reveils itself again.

February 28th, 2008 14:00

We have a similar issue and solution. We are in a bandwidth limited condition so if we bring up a new SRDF pair, we can't afford to have the new devices establish full throttle or else our existing SRDF/S traffic will take a you know what. To get around this, we use symqos -g DEVICE_GROUP set RDF pace 4.

February 29th, 2008 12:00

Not sure why they would say that. Check with your SE/SA - I'm sure they will agree that it is fine.

1 Rookie

 • 

5.7K Posts

February 29th, 2008 12:00

I was told by several EMC instructors to stay away from QoS...
No Events found!

Top