Start a Conversation

Unsolved

This post is more than 5 years old

835

January 30th, 2014 11:00

deduplication across image and file level backups

So I have a question that I can not find an hard definite answer to:

I have talked to different EMC support people and received different answers..

is there a white paper on it?

we have vms that backup:

vm1 does full image backups once a week

vm1 also does night file level backups daily.

is this data deduped, image and file level.

I know image level is deduped, and of course file level but can deduplication be used for lets say file level backups based off of an image level  backup?

2K Posts

January 30th, 2014 12:00

I haven't seen an official white paper on this but I was surprised to learn that some customers have seen surprisingly high de-dupe rates (70% or so) between image level and agent backups of the same virtual machines.

If the files are written to the VMDK file sequentially (so the logical file and its physical representation inside the VMDK are the same), the de-dupe rate will be high because the two sequences of bytes will be chunked up the same way by the de-dupe algorithm.

For example, if the file has 6 blocks that belong in the order 1->2->3->4->5->6 and these blocks are written to the VMDK as 1->2->3->4->5->6, this sequence of bytes will be de-duped. On the other hand, if the files are written to the VMDK as 2->1->6->3->4->5, the logical file and its physical representation on disk will be different, so image data will be de-duped against image data and agent data against agent data but the image data and agent data will not de-duplicate well.

Also, in my colleague's testing, he found that file system defragmentation at a guest level would destroy the commonality.

1 Rookie

 • 

39 Posts

February 3rd, 2014 11:00

ok,

thanks for the answer...

makes sense.. too bad there is no papers on this..

Do you know of a difference on linux systems, would the same pertain to when the filesystem has ran a FSCK (file system check)

2K Posts

February 3rd, 2014 11:00

When you run fsck on a Linux filesystem, it's a consistency check so I don't believe it will rearrange the data on disk. The Linux ext3 and ext4 filesystems aren't particularly prone to fragmentation so this would be less of a concern on Linux VMs. I suspect that the ext4 defragmentation tool "e4defrag" would likely reduce the commonality between image level and file level backups but I don't have any numbers on how much commonality loss there would be.

No Events found!

Top