Start a Conversation

Unsolved

This post is more than 5 years old

1482

January 25th, 2017 11:00

DD Cloud Tier chunk size - what is it?

Hello,

When using Cloud Tier, to what size does DD break up the files for upload to the cloud provider?  I'm trying to estimate AWS S3 cost and while I can do so with capacity easily, I need to also know how many PUT requests I'm going to experience for a given amount of data.

If I can find out the size data domain breaks up the local files into, then I can just divide that by my total capacity for Cloud Tier and then I'll have the number of PUT requests and be able to factor in that cost.

Thanks!

-Dave

PS: Can this size be modified?  A larger size can result in faster uploads/downloads (albeit trading off fewer "checkpoints").

Message was edited by: DavidSIE

30 Posts

January 27th, 2017 06:00

Hi David,

So in general most of the traffic going from a DDR to the cloud will be file data - this is stored in ~64Kb 'compression regions'/chunks and this size is not configurable. There will be other types of data also written to the cloud (i.e. metadata) however this is likely to be a much smaller proportion of what is uploaded.

Note, however, that its not possible to take a file on the DDR and divide its size by 64Kb to work out how many put requests you are likely to see as all data in the cloud is de-duplicated/compressed.

For example, lets say you have a 10Mb file on the active tier of your DDR which you are going to migrate to the cloud - you might think you can do 10Mb/64Kb = 160 PUT requests. Note, however, that this wouldn't be correct for the following reasons:

- The data being written to the cloud will be de-duplicated against data already in the cloud. For example if 95% of the data in your 10Mb file already exists within the cloud unit you are migrating to (as its referenced by other files which have already been migrated) the DDR will only need to upload the 5% unique data (i.e. 512Kb / 8 PUT requests). Working out how much of a file on the active tier is 'unique' when compared with existing data in a cloud unit is very complex and certainly not something that customers can do themselves (so you cannot gain any insight into how much data a file will upload during migration without actually uploading it)

- The data being written to the cloud will be compressed prior to upload. So again lets consider that 95% of the files data already exists in the cloud unit so only 512Kb need to be physically uploaded. If, however, this is compressed via lz before being uploaded it might get 2x compression so now only 256Kb physical data needs to be uploaded (i.e. 4 PUT requests). Again compression ratios depend on a number of factors and its pretty much impossible to say how 'compressible' some data is without actually compressing in during migration

Basically customers don't get any insight into this process which can make it hard to estimate exact costings. That being said DD LTR (long term retention to cloud) has been designed so that it writes to/reads from the cloud as little as possible to minimise costs.

Sorry I don't have a better answer but I hope this helps to some extent.

Thanks, James

3 Posts

May 28th, 2018 13:00

Hello David,

Sorry for the post necromancy after all this time , but I am facing the same issue you were facing in 2017 and I was wondering if you found your answer. How did you estimate the number of PUTs for your cloud tier storage on AWS?

(PS: Thanks very much to James_Ford for his long answer, it explained a lot!)

85 Posts

May 29th, 2018 04:00

Not to screw up your math. But the dragonheart release of code will increase the block size to MB. Its slated to be released in the next few days.

No Events found!

Top