Start a Conversation

Unsolved

This post is more than 5 years old

S

2964

October 16th, 2012 09:00

Encryption and the Deduplication penalty

We're running NetWorker 7.62 with a DataDomain 860 running ddboost, although my question would likely apply to Avamar as well.

I've seen several places that say if you turn on software encryption through NetWorker then your deduplication will suffer.  This makes sense to me initially because your encrypted blocks will obviously not dedupe with the unencrypted blocks.  But as soon as all the unencrypted data expires and gets cleaned off the DataDomain then our deduplication should be as good as it was before (assuming we don't change the passphrase, which we don't intend to do).  If I encrypt the same block twice won't I get 2 identical encrypted blocks?  Am I missing something here that would explain why I keep hearing the deduplication is going to suffer?

As a bonus question, will my encrypted VM image backups dedupe with my encrypted filebased backups?

40 Posts

October 16th, 2012 11:00

Hello,

I can’t tell you for sure why the dedupe ratio’s will plummet, but trust me…. They will, this happened to us when we initially put our dd boxes in and didn’t turn off networker encryption but had dd encryption on also.

14.3K Posts

October 16th, 2012 12:00

Sam_Powell wrote:

If I encrypt the same block twice won't I get 2 identical encrypted blocks? 

Only if you are XORing, but that is not seen as encryption. Encryption can cause different outputs for the same blocks and it all depends on algorithm used.

I don't see much point of encypting data on DD as it is not going outside DD anyway (unless you are afraid someone may take your DD from computer room, but then you have another issue). If you wish to encrypt network traffic, then using hardware which is designed for that is much better.

17 Posts

October 16th, 2012 19:00

We're not encrypting on the DD side, just through NetWorker so I'm hopeful that we won't see the same results as you cmartinjr.  I'll know in a couple weeks and I'll update this then!

Hrvoje,  I'm not sure, but I think it only uses the 1 encryption algorithm (aes-256) so as long as it's encrypted with the same passphrase it should encrypt to an identical block.  Now if the encryption phrase is salted with something (maybe the ssid or some other networker tag) then I can see how the output would be different, but I can't find anyone that knows the details well enough.  I'll reach out to my EMC sales rep and see if he can find me someone on the inside that knows exactly what's going on.  I'll let you guys know if I find anything useful!

Thanks for the replies!

Sam

17 Posts

October 16th, 2012 20:00

Oh, and I wanted to reply to your suggestion to use hardware encryption too.  I understand and have actually priced out using hardware to do the encryption but it seems to add an order of magnitude to the complexity of the system.  In a disaster scenario with hardware encryption we would first need to rebuild the encryption servers.  We would need unencrypted backups of the keys as well as the networker server that's transported and stored in an alternate location than the encrypted tapes (what good are encrypted tapes when the key is right next to them?).

If we do the encryption through software then all we need is the datazone passphrase.  I can install networker onto any server and rebuild the indexes by loading the tapes if needed.

The reason I'm so curious about understanding how the deduplication suffers is because I want to know exactly how much extra space I'm going to use by encrypting through software.  If it's not an unreasonable amount then I would much prefer to do it through software.  If it's too much then I'll bite the bullet and do hardware encryption.

Sam

14.3K Posts

October 17th, 2012 02:00

If we do the encryption through software then all we need is the datazone passphrase.  I can install networker onto any server and rebuild the indexes by loading the tapes if needed.

Usually when disaster strikes, you won't be there nor will be anyone else who knows passphrase - that's what makes disasters - the disaster.

The reason I'm so curious about understanding how the deduplication suffers is because I want to know exactly how much extra space I'm going to use by encrypting through software.  If it's not an unreasonable amount then I would much prefer to do it through software.  If it's too much then I'll bite the bullet and do hardware encryption.

Initially I thought it would be impossible to say how much it is, but on second thought encryption should set equal de-dupe ration for all data types as it would be equally random. If that assumption is correct, a simple test should address your query, but at the moment I do not have de-dupe at my site so I can't verify it. I'm not sure how many other users who are present on forum are using this combo and what their experience might be.

20 Posts

October 17th, 2012 07:00

I can tell you from experience that it will be roughly the same size. The same is true for compressed data, both effectively put a wrapper around the data and the only thing Data Domain thereby negating de-dupe functionality. A much better alternative if you cant use hardware encryption is to encrypt it after it has been de-duped (aka called Data Encryption at Rest or In-line encryption). You need a Data Domain encryption license but the functionality is already built-in Data Domain.

October 17th, 2012 20:00

If you want encryption enable it on the Data Domain, that way encryption happens after dedupe. Otherwise if a single byte changes in a file the entire encryption result is changed in Networker and you get 0% dedupe.

The same goes for Avamar, enabling encryption on Avamar reduces the ingestion rate and increases overhead on the Avamar system, however dedupe is still quite high.

No Events found!

Top