Unsolved

This post is more than 5 years old

31 Posts

9012

March 21st, 2014 05:00

Questions on DataDomain and DD Boost

Hello,

Our backup environment is tape-only at the moment. We are strongly encouraged to get rid of D2T backups and designate DataDomain as a primary backup target for all backups regardless of their nature. Tape would be used for cloning savesets from DataDomain for long-term retention. On top of that, we are strongly advised to get rid of incremental backups and use only full backups. Finally they insist on using Direct Client whenever possible.

I have a lot of words of criticism om this approach, but first I'd wish to know your thoughts on this "solution" to get if your counter-arguments match mine.

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

March 21st, 2014 05:00

I still use incremental backups - nothing wrong there and I still see no point of replacing incrementals with fulls when it comes to file system backup. Client direct is really nice - it comes with price tag, but I can surely see benefits.

10 Posts

March 21st, 2014 13:00

Client direct really sped up our backups, by eliminating the bottleneck thru the storage nodes.

We run FULLs on Friday, INCR the rest of the week....all to Data Domain.  Keep 30 days of it all on DD and clone only the FULLs to tape, for long term retention.

31 Posts

March 25th, 2014 22:00

My concerns about this solution are below:

  1. 1. Lack of flexibility: a single backup route is used for all backups regardless of they are dedupe-friendly or not. That severely hits global deduplication factor. This approach turns "smart" DD appliance into "dumb" disk staging area. In this scenario we may very well change DD with a bunch of disk staging areas that would be larger in size than current volume of undeduped backups on DD appliance.
  2. 2. I suggest to leave disk-to-tape backups for three reasons:
  3. a) For block-based backup and restore of huge file servers. From my experience if I/O block is high, say, 1Mb, nothing can beat block-based backup and restore in terms of speed. I myself experienced backups speeds up to 160 Megs/second per backup stream.
  4. b) Full backup and restore of multiterabyte databases especially if they are volatile OLTP databases they are not good for deduplication. Putting that database on dedupe appliance would hit dedupe ratio and might even fill up free space on an dedupe appliance. In case I/O block high enough one can attain backup speed up to 240-250 Megs/second per backup stream.
  5. c) Finally, I’d like to resort to tape backups if backup to DD doesn’t work as expected for some reason.   We need to take backups of hosts in questions till the issue we might face is not resolved. In short, we need «plan B» if something goes wrong with DD backups and restores.
  6. 3. Taking only full backups to DataDomain keep global dedupe ratio high. From the DD perspective, repetitive full backups are good because they keep up dedupe factor. The more identical data in new backups DD faces, the higher global dedupe factor on DD device.

The downsides of full backups are:

a) Long backup window. Backup window of full backup is much longer than one of incremental backup.

b) Negative performance impact on applications because full backups may not be completed till business hours.

c) LAN bandwidth is not unlimited: backups might take longer than expected due to network bottlenecks.

  1. 4.     As we copy/move data from DD to tape or restore data from DD, the negative effect of rehydration hits cloning/restoration performance. Thus data restoration from DD takes longer than restoration from tape or disk due to this overhead.
  2. 5.     A lot of new and potentially faulty code in backup environment. We will have up to two "plug-ins" per backup client (one for DD BOOST and another snapshot) and DD OS on the DD side. DD BOOST plug-in might very well be faulty for some platforms and applications. It would be good if we face errors during backups. The worst thing is when we might be unable to restore data from DD due to faulty code in DD BOOST plug-in. We should perform test restores for clients where we deployed DD BOOST agents after agents are deployed or updated.
  3. 6.     Having disk as backup destination helps to cut on tapes drives in backup environment. The industry estimates are quite different from 25 to 30%, but you can’t get rid of them completely as tape is cheaper for long term retention than disk. 

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

March 26th, 2014 04:00

  • Did you do data classification?  Before I went to DD, I had one already and I had no backups that would be so unfriendly (the most unfriendly backups in respect to de-dupe are archive logs, but even there I managed to get de-dupe ratios which I see acceptable given the volume).   I also have small i500 which I use for long term backups (which are always exports).  If I would ever hit an issue with non-dedupe friendly content, it would be rather specific case which I would then either direct to tape library or dedicated solution (eg. if content is to be kept for compliance and it is video recording or similar).  Bottom line is, I could not see any reason not to embrace DD for operational backups as it delivers exactly what is meant to do.  Of course, if you have specific data sets, this may require specific solutions too, but in general I would not worry.
  • Are you using BBB? Because with DD there are some enhancements if using BBB (which you don't get with tape). I see BBB as rather specific solution and not something I would use generally - therefore again I do not see this as reason to discard any de-dupe product. It certainly does not fit well de-dupe due to its nature, same applies to compression too so nature of such stream will have consequences for both disk and tape.
  • I have multiTB databases as described and I see no issues.  I also see way better speeds than author of that comment stated.
  • It seems to me that author made copy&paste of some old white paper where de-dupe is not seen as a good guy. Those downsides can be listed regardless of backup target, but in combination with disk (and de-dupe) you can things easier. Unlike you are running machines and setup from last century, most of them should not be an issue nowadays at all. 
  • Restoration argument is silly.  I did test this and I did test in combinations with DD cleanup and NW nsrim to measure effects and I could not see much of effect (there is some, but ignorable). Restore itself, had no effect on performance at all.  Perhaps if I run 120 restore sessions then perhaps yes, but I didn't have 120 restore sessions from tape before so it is useless to think about it in the first place.
  • DD Boost plugin story is silly (who wrote that? who is so against DD hehe) DD Boost plugin comes with NW.  Let's assume you do not use plugin.  Do you test every NW version you roll out on client?  Yes?  Then this would be no different.  No?  Then you run same risk that some nsr lib or binary might be broken for you.  It is simple as that.
  • I want my operation backups to be smarter.  Then means integrated with disk arrays, replication and so on.  I can't do this with tape at this stage.  Tape has its own place for sure, but not for operational backups.  Of course, you can use it, but the guy next door using disk solution will feel better and will have better setup.


By reading what you said, well - what you copy and pasted as I see you have messed up order list - I assume you have 2 sides shouting to your ear "pick me, pick me!" and you do not know what to do at this point.  Do a test.  Get DD box, test it and see how it works for you.  If you like it, buy it.  If not, don't.  I can't say what you have in your environment so any statement like "this is best for you" would be wrong, but generally you can't go wrong with de-dupe appliance and DD is really best in that area.  And being integrated with NW comes as really nice bonus.

31 Posts

March 26th, 2014 08:00

Actually, I'm the author of this document you criticised so vigourously. And yes, it is full of fears on designating DD as primary backup target. Please get me right, I'm not against DD, but it is unreasonable to use it as a backup target for all clients. I'd rather put archive logs on some AFTDs, despite of hampering dedupe ratio.

Turning to you questions,

1. No data classification is not implemented yet. I'm in the process of in-home audit + we asked for external audit from EMC.

Our environment is rather modest: we do about 250 Tb a month, 2000 backup sessions a day, all backups are  D2T, Volume of incremental backup is about a tenths of volume of full backups. Most of storage nodes are Oracle servers. Nothing peculiar.

I'm just anxious about putting all backups to a smart applience I don't completely srust.

2. Don't know that BBB means in this context. Apparently, we don't use that.

3. Ok, we'll direct one of our "fat cats" to DD and see.

4. So you mean that rehidration overhead may be ignored when restoring/ cloning from DD.

5. OK, Iknow that DD Boost is built-into NW client code. But from functional standpoint it is a pluging that may be activated or deactivated. I wonder if you used Client Direct option extensively. EMC ads on youtube EMC NetWorker 8.0 Client Direct Demo - YouTube are promising, but I wonder how long would it take to dedupe 100000 files on the client side? Plus, EMC states Ditect Client options requires that both NW Client and DD be on the same network. Is that IP-network (VLAN). So if my environment is chopped with firewalls, I'll have install dedicated backup interfaces, use 802.1q etc.

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

March 26th, 2014 13:00

I really don't see why?  I have around 30000 sessions of archive logs daily (SQL, SAP and Oracle).  When I compare previous ratio I had with VTL without de-dupe and what I have now (using DDBoost) I see benefit there.  And this is without latest DDOS which brought some enhancements to Oracle multiplexing (which I also tested during POC) and have not seen any issues to be concerned.  I was also skeptic, but the best way to address your questions is to actually - test it yourself.

  1. what I did was following: back in VTL days I had pools for DB, ARCH and FS.  I could easily see compression rates on tapes based on pools.  One might argue this is too general, but on 800 clients (which includes some 900 databases) I could get fairly consistent data on compression rates.  We do not use Oracle compression except on SAP side.  We started using that during VTL days and I was happy with results as I could not see visible impact.  When I turned to DD Boost I originally tested first VTL to have 1:1 kind of tests.  There were not exactly 1:1 of course, but ratio has improved thanks to de-dupe. As expected, archive logs were worst candidates, but worst is perhaps not the right word - it is still better for fact of 2 than with VTL which is enough to me.  Switching to DD Boost ratio (usage) remained the same. In my organisation, we do not rely on dumps for operational backups which can be heavy on de-dupe nor we have media file backups from video recordings (that has dedicated solution) so in my case - which I think is general - there is nothing I could say bad about this solution. Your case might be different of course.  Again, try to make setup so that you can compare it as much as possible and test it.
  2. BBB is block based backups.  In your text you used block based backups as an example.
  3. Yes, please do so.  I use PowerSnap to (still) for fat cats, but I also did tests with direct LAN backup using DD Boost and I was impressed with numbers (11TB SAP gone 7TB after Oracle compression as an example). I'm quite sure my setup is still far away from perfect as restore is a little bit faster than backup which came as pleasant surprise.
  4. Yes, I could not see that.  Please bare in mind that in my case cloning does not cause any impact as I use CCR (cloned controlled replication).  I never stage nor clone from DD to tape.  If something is candidate for tape, it goes there directly.
  5. I use client direct all the time.  I started to use it with NW7.  For some reason people believed you need NW8 for that, but that's non-sense (at least for database backups - for fs backup that is correct).  So I used NW7 clients with NMDA 1.2 and NMSAP 4.2 and NSR_DIRECT enabled and it worked just fine. When I went to NW8, fs backup started to use DD Boost too.  I didn't get/see any issues so far.  Network wise, if your client can communicate with DD then that's it.  If not, client will fail over to legacy method over storage node (and your storage node will use DD Boost to write data). I have two physical network pipes and bunch of VLANs and you can use ifgroups on DD side to make things more smooth (in theory - in practice I still have to test this - ifgroups are on my todo list this year).

10 Posts

March 26th, 2014 14:00

I haven't read any discussion from the "restore" perspective.

Comparing restores from DD vs tape....you get to eliminate the time and hastle of waiting for the tape to come back from offsite storage, as well as faulty tape and tape drive issues

We're using approx 200TB of the total capacity of our DD.  We're getting 10X deduplication. That equates to approx 2PB of backed up data.

Or....2,500 LTO4 tapes!

Another consideration....

Using the DD via a CIFS/NFS mount point to store SQL/RMAN generated copies of databases.  We all know the DBAs will keep multiple copies or their DBs.  Why store those copies on primary storage?  DD space is cheaper PLUS you get to take advantage of dedupe*.

*Important note - make sure the DBA's don't compress their backups before writing to DD.

31 Posts

March 26th, 2014 23:00

1. Actually I  had IBM Protectier with dedupe option in another company I used to work. Basically we put on МЕД something like /, /opt, /var and drive C:| backups that is the sort of data that pudupes well. As for putting archive logs on DD, we'll give it a try, but for me the best destination for logs is AFTDs.

2. Or, I see. We don't use block-based backups here, but I'm quite familiar with this technique (NetBackup FlashBackup).

3. Ok, so use clone-controlled replication aka optimized duplication where NW server controls and DD does the real job. Now we are strongly encouraged to do cloning from DD to tape for long term retention and perform that cloning in the daytime. That is why I'm so concerned about negative effect of such cloning. Another issue I see now is how Client-direct backups and CIFS/NFS backups are cloned to tape. I know that DD might emulate VTL and see no problem with cloning from virtual tape to physical, but how client-direct backups are cloned to tape?

4. So having DD and Client direct-enabled client in different VLANs is not an issue. I suppose I will have to open some TCP port in order to enable Direct client to work?  

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

March 27th, 2014 02:00

  • I can't understand your love towards AFTD (even more since AFTD is rather old architecture from modern PoV).  If I had to chose between regular disk cache and DD for archive logs, I would go DD.
  • Cloning is reading from A and writing to B.  If A!=B in terms of DD, then location A will read normally data (as it would for restore) and write it to B (tape). I can see where writing to DD first and then tape would be beneficial (where you create clone with different retention).  Depending to speed you get, using disk first might the way to go if you can't feed quickly enough tapes.  I personally, as I said, data for tape push directly to tape (and this is only long-term backup), but I have seen couple of folks doing cloning from DD to tape and it works just fine.
  • There is an article about ports, NW, DD and DD Boost in KB.

31 Posts

April 8th, 2014 09:00

Hello,

To those guys who clone from DD to tape. I have a volume of backups that should be cloned from DD to tape on the daily basis and cloning window (at the daytime). Hence, I have calculated cloning throughput. Now I want to know how many tape drives I need to clone all the data. I would be nice if someone share their cloning speed (per session) from DD to tape. 

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

April 8th, 2014 10:00

This will depend on DD model, which disks you use and which throughput DD can give you.  In some instances it may depends on number of disk shelves you have as there is sweet spot there too sometimes.  So, there is no straight forward answer unless you can provide those details plus what volume you wish to move over.  Cloning is no different that restoring for DD in respect to speed (when writing to tape) so your question can be answered by anyone who ever did such measurement.

2.4K Posts

April 8th, 2014 14:00

To give you an idea how we do it ...

We actually clone fulls which were created during the first week of each month.

We have sorted this data to go to appropiate DDBoost devices. We have

  2 SNs, each running

      4 DDBoost devices (for the no-clone data pool)

      6 DDBoost devices (for the clone data pool)

      6 LTO5 tape drives (one for each DDBoost device)

Since we implemented DDBoost it seems that we can achieve the max data rate for the tape drives (about 150..160MB/s).

And we still run backups during the day.

May 6th, 2014 05:00

LaBounty

You said "DD space is cheaper PLUS you get to take advantage of dedupe*". In my experience, ES30 shelf is twice as expensive as a VNX shelf of the same disks. And now they appear to be the same hardware. I know this because I just priced one for my DD670 and the cost was insanely high, so much so that we are looking at another solution.

No Events found!

Top