This post is more than 5 years old

1 Rookie

 • 

82 Posts

1457

June 17th, 2009 14:00

estimated SRDF/A completion time

I've never paid much attention to the completion time displayed when R1 and R2 are trying to sync up, how accurate is that?

Does it make sense trying to estimate how long it would take to replicate a certain amount of data via SRDF/A? if so what factors would be needed to make this rough estimate? off the top of my head I'm thinking amount of bandwidth available and the symm's cache size.

Thanks!

2 Intern

 • 

448 Posts

June 18th, 2009 05:00

The time given is just the simple math of amount of tracks being transferred at that moment divided into the amount of tracks of data left to send. It is a decent ballpark shot at time but it does not tend to be accurate. What it doesnt take into account (as it cannot) is the amount of changing traacks that have to go over as well and any deviations to the data transfrer rate.

4 Operator

 • 

2.8K Posts

June 18th, 2009 02:00

My 1.5 cents... Use a recent S.E. version (6.5.3 or 7.0).
Older version may give inaccurate output. 6.5.3 and 7.0 offers enhancements on accuracy. But don't hold your breat! :D S.E. offers ESTIMATES, just like the ETA of flights at the airport ;-)

4 Operator

 • 

2.1K Posts

June 18th, 2009 07:00

I generally use this as a starting point and assume it may take up to 50% more. Often it is reasonably close, but it can't account for the many factors that can affect your throughput.

I usually check the throughput at the same time as the estimate. I know what my configuration is capable of and what it averages, so if I'm getting an estimate of 1 hour to complete, but the throughput is pushing the absolute limits of the system capability I know it will likely be anywhere from 1.5 to 2 hours. If the throughput is showing very low, there is a good chance that whatever is affecting it is only temporary and I may be anywhere from 30 minutes to the estimated 1 hour.

Not very scientific, but it gets the job done. As always, if the system says 1 hour and you think it might be 1.5, if anyone asks you should tell them 3... then when it is done in 2 you will still be a hero even though it was 30 minutes more than you really thought it would be :-)

1 Rookie

 • 

82 Posts

June 18th, 2009 12:00

thanks for all the suggestions guys, I especially like Allen's idea of padding in some time, but I do that with everything anyways so I always look good in front of the customer LOL

1 Rookie

 • 

82 Posts

June 19th, 2009 09:00

Ok guys sorry to bring this one back up, but I would like your opinions on the result of our test. The test was to copy 400GB worth of data to our NAS mountpoint that resides on the SRDF/A pool and see how long it would take to replicate.

so before we run the test I did a symstat to determine available cache
so the cache slot available to all SRDF/A sessions was around 45GB and the system write pending limit was 48GB on both systems.

the initial state of the SRDF pair was consistent, here's brief log of what occurred
R2INV=invalid tracks on R2
TC=total amount of change in MB


7:17 started copy process, state:consistent, R2INV 0 TC 0
7:15 state:partitioned, R2INV 0 TC 0
7:25 state:consistent, R2INV 0 TC 0
7:42 still copying data, state: synchronized, R2INV 0 TC 0, R1 cache in use 93%, R2 cache in use 14%
7.44 still copying data, state: suspended, R2INV 1.3million TC 41669.9

after it reached suspended state we stopped copying data to the mountpoint, at this point the R2INV is well over 7 million. I put the pair in adaptive disk copy mode and tried to reestablish. It seems to be working as I can see the R2INV going down. But the pair state says synchronized instead of SyncInProc.

basically what this test is telling me is that imbalance occurred between incoming write I/O workload and the outgoing SRDF/A bandwidth, or R2 is unable to destage quick enough, such that the global memory on R1 symm became full, as its consumed by all the inactive and active cycles.

What I don't get is why the pair remains in synchronized state during acp_disk mode.

4 Operator

 • 

2.1K Posts

June 19th, 2009 10:00

I'm wondering if there is something more going on as well. I'm a bit confused by the synchronized state during acp_disk mode, but I'm also not sure why you were showing the state at 7:15 as partitioned. Is your link stable?

Also, you were showing synchronized at 7:42. At that point it should have still been showing consistent. Unless I'm completely out to lunch (and it is Friday afternoon ere, so it's entirely possible I'm missing something), synchronized isn't even a valid state for SRDF/A in async mode.

Anyone else have any thoughts?

1 Rookie

 • 

82 Posts

June 19th, 2009 11:00

sorry mistake on my part, I just double checked the logs, there were over 450 devs in the RDF pool and I only looked at the bottom of the lists, the ones that show as synchronized either finished copying the delta or were never used during the initial copy, the devices that are trying to catch up does say SyncinProc. But why did all devs show as synchronized at 7:42 before suspending.

yeah I'm still not sure why it went into partitioned state for like a bit, according to operations there were no network hiccups. Weird thing is, it also happened yesterday when I tried replicating about 1GB worth of data, it went into partitioned state for a few sec and then become consistent again.

so of confirmed my own suspicion when I looked at the ECC logs,at the time the pair suspended, R1 reported an alert saying the write pending cache has been reached and we're not allowed to throttle host.

I don't know all this just seems off, according to operations, this is the first time >1GB worth of data has been tested over this SRDF link. I think they said the initial SRDF qualifier was done on a temporary OC48 pipe. They did a full establish on the entire 10TB pool ( no actual FS/data on it at the time) and it only took 4 hours.

0 events found

No Events found!

Top