oldhercules

116 Posts

3608

November 10th, 2015 08:00

mtree replication / R/O access?

Hello,

We are considering to change our replication method from collection replication to mtree replication in the near future because we will do a data center split and both DDs must be active backup targets.

Originally CCR was planned.. but I really don't expect that it could work for us.. we have daily 50-100TiB pre compr. data in 20k++ savesets (7 datazones), we have to clone all data to the remote site.

There are many restrictions for CCR which is not very clear to me, like the admin guide says that I should not start more than 30 parallel clone sessions, because if I do so they may timeout - but I don't know what happens if the clones are "immediate clones" - started by nw when a saveset is completed. We also have manual backups, so I need to run scheduled cloning too (clone everything from pool A to pool B with a limit of max. copies=2 or something like that)..but may I run this schedule for a pool where immediate clone is also configured? Or do I have to define many new pools which is against the best practice which tells me to have as few pools as possible?

What if we collect some lag in the replication, the src/tgt device contention, cloning parallelism will not prevent us to finish the job?

The cloning results should be checked daily... so despite of the benefits of CCR I think it would be a hard work to keep the clones in sync.

I've recently discovered that the new DDOS 5.5 finally supports mtree replication of ddboost devices. This thing could help us a lot.

But we are staging long retention backups to PTLs, so the data must be available on both sites. And there is no better method to replicate data between DCs like the deduped data transfer (btw. Mtree replication benefits anything from the dedup?)

There is a huge limitation: no CCR is possible if an Mtree is a replication source. As the mtree replication is invisible to networker I tought that we should use 2 mtrees: one mtree would be replicated by the DD and the other one by CCR (with the data which will be staged). But this wont work, because all networker cloning operations are CCR operations in the DD locally too, which leads to the original problem: this is not allowed in this setup..

..and here comes the idea (and the real question ):

The mtree replica on the target DD is Read-Only accessible. So what if I mount it on a remote SN and I try to recover from it (locally)?

I think the media DB has info about the ss placement on the volumes, but not about volume locations...may I try to mount it? Should it work?

I can't test it, because we still use the collection replication, I can't break it..but once it will be broken we have to setup the new method asap, so there will be no time for basic design testing..

Please share your opinions with me!

Best regards,

Istvan

Responses(12)

ble1

2 Intern

•

14.3K Posts

0

November 10th, 2015 16:00

Using instant cloning (upon ssid completion) is better since you spread replication streams. 30 sessions is how currently best practice replication setting by default in NW which you can easily change/control via nsrcloneconfig. nsrclone outside group management (manual or scheduled one) will also work just fine. You don't need new pool for those unless you wish to separate manual saves to separate pool (or those running manual saves have no clue of what pool to use so you create pool which will capture saves with level manual). I have similar (well bigger) environment and I do not have any issues with CCR so I would not worry about any sync issue.

I don't understand why would you have 2 mtrees. You have mtree per datazone and you can either use CCR or direct DD level replication which remains unknown to backup application. mdb also has information about location of the volume (as of 8.1.x I believe location field for disk volumes reports DD name where it is). CCR is quite nice and it makes life much easier.

oldhercules

116 Posts

0

November 11th, 2015 00:00

Are you sure that 30 sessions is sufficient for cloning?

I have for eg. a savegroup which makes weekly 1 full, daily incr. linux fs backups, the result is 556 savesets because of many small file systems.

This is only one savegroup within a DZ and I have 6 more nw. servers..

The other thing is that we have some huge VADP backups too and when I tested the cloning of these data _within_ the DD (between volumes) it took a "lot of time" (20+ mins for a 152 GB fullvm saveset).

ble1

2 Intern

•

14.3K Posts

0

November 11th, 2015 02:00

Yes it is. I do size my group so I do not have more than 20 clients in, but with 20 client that is around 100-200 save sets. Replication itself goes fast so all these get completed rather nice and quickly. I have 12 backup servers. As for VADP, I do not use it so I can't comment on that one.

oldhercules

116 Posts

0

November 11th, 2015 03:00

OK, thanks. Your daily backup amount is also that much as we have (up to 100 TB)?

What is the network rtt for your remote DD?

ble1

2 Intern

•

14.3K Posts

0

November 11th, 2015 10:00

My rtt is small - for local DC is 0.2ms and for remote one near 0.5ms. However, in both DCs DDs are attached to core and I use flat nonrouted VLAN to access them for data traffic. In your case, if this is WAN, I believe DD did have some tested limit for roundtrip for which they certified ddboost, but I do not know what value was it (they increased it last summer). Yes, daily traffic is around 100TB (mostly DBs).

oldhercules

116 Posts

0

November 12th, 2015 09:00

As I remember the maximum is 20ms. Now we have 14ms on the line what we use for collection replication (~1000 km)

In the new setup it should be not more and luckily they'll be on the same vlan..

I know that you're operating a full-green nsr, but what are your experiences with savegroup failure handling? The result of a savegroup depends on the cloning results? The savegroup will run until the cloning is finished? This is important, because we have some rare cases when it takes a lots of time to finish a group.. so I'm afraid that the group won't be started on the next day if the save+clone can't finish within 24h..

btw. I made a small test with adv_file devices:

I made a new pool and device, made a small manual backup to it.

then I've copied the directory of the adv_file device to an another SN (with scp) and I was able to mount and use it for restore (at the same time the original device was also available).

Maybe this is some kind of unexpected usage of networker (I've asked the support to check this).

When I was trying to recover the data at the first time I had to specify the recover SN to force the recover command to use the copied data, but at the next run recover automatically turned to this copied volume. That's what I don't really like.. maybe this trick influences the server in a bad way..

(The original idea came from the fact that we use ddboost volumes which are mounted on more SNs (R/W) plus I also knew that it's possible to relocate a tape volume to an another PTL and the contents are still available. So I tought that the copies made outside networker should be usable (without scanning them), it's clear that they aren't recognized as clones, but nsr knows the contents of these volumes...)

ble1

2 Intern

•

14.3K Posts

0

November 13th, 2015 15:00

oldhercules wrote:

I know that you're operating a full-green nsr, but what are your experiences with savegroup failure handling? The result of a savegroup depends on the cloning results? The savegroup will run until the cloning is finished? This is important, because we have some rare cases when it takes a lots of time to finish a group.. so I'm afraid that the group won't be started on the next day if the save+clone can't finish within 24h..

I don't have issues like this. However if I would have, I would isolate backup jobs to client(s) that runs so long and they would have their dedicated group. In other words, if I know that something is running long, I would simply isolate that from the rest and would not mix it with faster jobs in the same group.

As far as your test goes, you can have device sharing in NW, but to manually scp files is silly and can only lead to issues soon or later.

oldhercules

116 Posts

0

November 15th, 2015 22:00

I don't want to use scp for NW data cloning at all! I wanted to see what happens with networker when I make a disk volume available for him on a new location which was created outside of nw -> which is the "same" thing what MTree replication does..

oldhercules

116 Posts

0

November 16th, 2015 01:00

What are your cloning parallelism settings?

nsr says here:

Number of clone threads per client set to 30. Max number of savesets per thread is set to 0. Max client threads is set to 20.

I've tested the savegroup initiated cloning within the DD to a test clone pool and I managed to get

5170963:84620:nsrclone: nw_ddcl_connect failed on host xxxxxx: Connecting to 'xxxxxxx' failed [5002] ([ 3335] [140157325903616] Mon Nov 16 09:35:39 2015 ddpi_connect_with_user_pwd() failed, Hostname: xxxxxxxx, Err: 5002-max allowed connections exceeded

ble1

2 Intern

•

14.3K Posts

0

November 16th, 2015 05:00

Which DDs and DDOS are you using?

ble1

2 Intern

•

14.3K Posts

0

November 16th, 2015 05:00

oldhercules wrote:

I don't want to use scp for NW data cloning at all! I wanted to see what happens with networker when I make a disk volume available for him on a new location which was created outside of nw -> which is the "same" thing what MTree replication does..

You must unmount original volume and then you mount replica (eventually you will have to adjust path depending what it is now). I'm not sure if you need to break off replication pair or not - I guess not for RO operations.

oldhercules

116 Posts

0

November 16th, 2015 23:00

We have two DD990, the current DDOS is 5.4.4.2, but it will be upgraded to 5.5.x soon (when I finish upgrading all NW servers/clients to nw 8.1sp3).

I don't know how it is working with mtrees, but with adv_file devices it was possible to mount the "same" volume (the original and the "scp replica" at the same time on two different SNs.

Now I it seems to me that we should select CCR.. for the following reasons:

- you told me that CCR working well for such a big environment as yours .

- we have to use much more mtrees in case of mtree cloning: 2 / datazone (1-1 active on both sites and their clones on both sites). We will have 8 datazones plus we need 4+4 mtrees for VDP appliances (I hope they will be implemented soon). (Maybe we can use a mixed setup (CCR for datazones with PTL and mtree for the other ones, but then we have a more complicated setup).

- I plan to use "automatic" saveset migration to PTL for selected DZs.. and this could lead to unexpected state if I use the mtree replica for reading but the cloning deletes the source ss when the migraton is finished.. + it seems my idea to use a r/o mtree replica is a bit weird

- I know that I can access any mtree with networker, but currently I have no idea how to create "subdirs" without the device config wizard which assumes that the mtree name is equal with the nw server name.. (mtree replication creates an mtree on the target dd with the same name as the source has.. but I also have to use an active mtree on the target because we have to make backups on both sites within each DZ)

- I was informed a little bit late about the fact that each DZ expects to restore backups on boths sites regardless where the backup was made (as part of the daily operation, not only in DR case) (==restore backup of the prod systems to test systems which are located in different DCs). So if the replica should be used in the daily operation not only for PTL migration I think it's better to do it in a fully supported way and I have to create ss clones instead of R/O replica mounts..

View All

No Events found!

NetWorker

mtree replication / R/O access?