vucko
2 Iron

NetWorker optimization for the proper use of LTO tapes

Hello Sirs,

How to proper configure group parallelism and drive sessions in order to get control over LTO Tapes use over weekend,

or better to say how to tell NetWorker to use only one tape (until it is full) for the particular Level/Pool?

Here the background of this question:


Current NW settings:

- Backup schedule over weekend : Full,  Incr. and  Levels 2,3,5,…
- Jukebox TS3500
- LTO5 Drives :  7 pcs in use  / Max/Target Sessions=: 40
- SaveGroup : SG01 /60 clients/ Savegrp parallelism=25
- Backup schedule Daily incr -Saturday’s Level 2
- Media Pool : Level 2 / all LTO5 drives are selected as target devices for this pool.

When the SG01 was started  5 tape drives were free, and NW has mounted/labeled 5 LTO tapes
and written on 4 tapes only  a few save sets. Therefore I’ve got now  5xLevel 2 tapes ,
one of them is 95% used and other 4 with 0,1% only. Same situation with Level 3 and during the daily Incr. Backup

According our policy the Level tapes must be exported  and kept out of jukebox for the period of retention policy.
This means that I’ve to use weekly at least  10 LTO tapes for Level backup instead of 2 tapes.
and from other side at the moment there are around 50 “appendable tapes”  in the jukebox ,
and at least 3 LTO tapes are labeled for the diverse media pools with different percent of  use  like 20%, 45%, 63%  etc…

Thanks & Regards

Vucko

Reply
5 Replies
Highlighted
ble1
6 Indium

Re: NetWorker optimization for the proper use of LTO tapes

The backup approach in use is somewhat strange and part of old paradigm not being honored anymore.  You see, you wish for more if not all streams to go into fewer tapes (one or two).  You can achieve that, but by doing that you multiplex more streams onto it. Doing so, you increase restore times and those are very important. Backup, and I'm talking about really backup here, is usually kept nearline for restores. To get data out of the library, usually you have/had two reasons: one, to protect physical media on second location in safe or two, not having enough capacity in library and having to export those filled and eventually import it back once they are recalled. Not sure what your reason is and if any listed previously is your match.

In case of limited resources, you would label fixed range of tapes for certain pool and do import and export as desired. How does NW fill tapes here does not matter. In case that you have to export them to feel more safe, than filling up the tape matters. However, you are then messing up restore times which is even worse. To address that, people started using disk cache and from there they would stage (and that way demultiplex) streams to tape and achieve what you want. Soon (or in some cases later), disk cache took over role of primary backup storage (PBBA) and in those cases where extra copy was needed (cloning before) mirroring took over.  And nowadays we see integration between backup application and PBBA where many tasks are done without actual data movement (synthetic fulls, cloning, etc). So, in your case I would rething what I'm doing and most importantly why. If you have to and you must use what you use, then it is clear what you need to do, but also the price you have to pay for.

Reply
Highlighted
Thierry101
3 Argentium

Re: NetWorker optimization for the proper use of LTO tapes

LTO5 Drives :  7 pcs in use  / Max/Target Sessions=: 40??

that definitely hurt your restore...have you tried...?

Reply
Highlighted
vucko
2 Iron

Re: NetWorker optimization for the proper use of LTO tapes

I've just had one restore of 4 GB from last week, it took around 15 min. (Windows folder with 7156 files) including import of Level3 tape from weekend... 
(Level tapes are stored outside of Machine Room with jukebox)

Start time  6/20/2012 1:32 PM

Recovering 7156 files within K:\xxxxxxx\ into D:\Restore\xxxxxxxx

Total estimated disk space needed for recover is 4462 MB.

Requesting 7156 file(s), this may take a while...

Requesting 1 recover session(s) from server.

.........................................

Received 7156 file(s) from NSR server `xxxxxxxxxxxxxxx
Recover completion time: 6/20/2012 1:46:12 PM

------------------------

What would you suggest how to organize /configure backup of around 300 clients / 14 TB data  within 12-14 hours (only due the night time...)????

Reply
Highlighted
vucko
2 Iron

Re: NetWorker optimization for the proper use of LTO tapes

Hello Hrvoje,

Here some facts maybe for better understanding...


Actually our restores are ok, maybe not perfect but OK, for Oracle RMAN Full (around 800 GB) restore we need around 6-8 hours, and that is acceptable from company side. Level and Full tapes are stored outside of machine (jukebox) room because of disaster. Recent/incr. backup and backup of less important data (Windows OS system files, test servers, etc.) are being saved into separate media pool(s) with retentions of 1,2 or 4 weeks. Company important data (database backup) are being saved in another media pools with longer retention period... 


Keeping important data saved "nearline" only for "fast restore" is not the main backup target in my company, management is more interested to have all "production data" saved in case of disaster (war, earthquake, alien attack,  or whatever..:-) and because of the legal obligations, and of course for restore purposes too…
From other side, with use of diff. system features like limited “write rights”, file server replication etc...requests for file restore are really not so often (one-two per month)....

Now back to my question:

Tape Drive (rmt01) target session is set to 40 and savegroup parallelism is set to 25, means when this
savegroup start backup into certain pool 25 streams only are sent to tape drive rmt01, and NetWorker mounts and labeled 3 tapes for that pool???
And this is what definitely  hurt my restores ,  saving data from one client on to 3 tapes in the same time...

Reply
Highlighted
ble1
6 Indium

Re: NetWorker optimization for the proper use of LTO tapes

I think it would be more wise to invest into disk backup+miroring and you would have much better setup.  I still use VTL (4 years old solution) and 1TB RMAN restore takes around 50 minutes to an hour. While disaster recovery is improtant, that is not common type of restore.  Backup is operational category and as such management should take a look at it as such. For DR, more important is long term backup or archiving. There are many companies who thought long backups will help them with disaster to learn that once disaster happened they didn't have any machines to restore to.  But ok, that's not my problem

SGRp will limit 25 sessions (identified by savefs) to kick off.  That means, you should not see at any time more than 25 sessions from that specific group. That number is below target session count by drive meaning all sessions should end up on the same volume. Are you sure 40 is target session and not max session?  That would explain why 25 sessions are spread. Another explanation would be either bug in version you run or feature (I believe last year someone mentioned ts max 8 to be hardcoded for tape devices, but I don't remember seeing that in writing).

Perhaps another way to force it, and I'm not fan of this approach at all, is through pool restrictions where you just assign 1 or 2 drives to the pool - if at any time you do not use more than 1 or two tapes (given that you keep everything on one tape with 40 sessions mixed).

I do not think more tapes hurts your restores at all.  You see, with RMAN, how many channels you allocate is what parallelism you get from the client.  So, let's say you run 8 channels and all 8 channels end up on same tape. When you do restore you can't get 8 channels simply because rman can't read 8 channels at the same time from same tape - multiplexed reads are not handled by rman. With disk you can (well, let's be fair and say that current NW is awful and addresses single read request from disk device which is about to change in NW8 finally).  If 8 streams are spread across 3 tapes, then at least you can pull out 3 streams and shorten your restore time.

Reply