mvvasa
1 Copper

Physical Capacity Measurement on Data Domain

In DD OS 5.7, we introduced a new Data Domain feature called Physical Capacity Measurement that allows you to measure the post-dedupe capacity consumed for one or more files or directories or even tenants. There's a whitepaper that discusses the use cases we've heard from customers and the best practices around using the feature.

The paper can be found at: http://www.emc.com/collateral/white-papers/h14487-data-domain-physical-capacity-measurement-wp.pdf

If some of you are already using this feature, would love to hear how its working out including any challenges you may be facing.

Thanks!

Mayank Vasa,

Data Domain Product Management

Tags (1)
0 Kudos
5 Replies
dynamox
6 Thallium

Re: Physical Capacity Measurement on Data Domain

Mayank,

Thank you for the link, very excited about this functionality as we have been struggling over the years to correctly bill our internal data domain customers. A couple of questions:

1)  Help me interpret this statement.

  • What is the unit of measure here, is it Tenant, Tenant Unit, Mtree ? 
  • If we are saying that we are going to threat these files as if they were the only files, then how is this physical number different from the logical capacity number ? Let's say i have a folder underneath an Mtree for one of my SQL servers. If i use Window Explorer to look at properties of that folder it reports 1TB, so that's my logical utilization. If we are assuming this data does not exist anywhere else then that should be my physical utilization as well (based on the paragraph below).  What am i missing ?

2016-04-10_08-16-25.jpg

2) Paper mentions that we have to schedule some of these measurements.  Are we talking about scheduling them for Tenants, Tenant Unit, Mtree's ?  If i have customer who has a folder underneath an Mtree and he/she says, tell me how much my SQL server is using right now.  What would i do to get that number ?

Thank you

0 Kudos
Highlighted
mvvasa
1 Copper

Re: Physical Capacity Measurement on Data Domain

Happy to hear you found it exciting - I am too on this particular feature as many customers have been asking for it!

I'll try answering your questions:

What is the unit of measure here, is it Tenant, Tenant Unit, Mtree ?

[MV] On a given DD, it can be a file, set of files, an MTree, or a tenant unit.


If we are saying that we are going to treat these files as if they were the only files, then how is this physical number different from the logical capacity number ?

[MV] It takes into account the amount of deduplication happening within a given unit of measure (as defined in the above response). So say you want to do chargeback for a tenant1 that has MTreee1. When you calculate the physical capacity for that MTree1, it will consider the deduplication within that tenant's workload and charge them accordingly.


Expanding on that, if tenant2 has their data in MTree2 and you want to do chargeback for them. When you calculate the physical capacity for MTree2, while its possible that there may be common segments between MTree1 &  MTree2, it will do the equivalent of treating MTree2 as the only data on the DD and provide the physical capacity consumed by that MTree2 after it has deduplicated within iteself.


This allows for a fair consumption based chargeback model instead of the first tenant landing on the DD being penalized the most and the following tenants being charged less.


This concept has resonated very well amongst our customer base. Would certainly like to know if it fits your needs?


2) Paper mentions that we have to schedule some of these measurements.  Are we talking about scheduling them for Tenants, Tenant Unit, Mtree's ?  If i have customer who has a folder underneath an Mtree and he/she says, tell me how much my SQL server is using right now.  What would i do to get that number ?

[MV] All measurements are schedule based - the schedule for MTrees is automatically created. We dont provide real time physical capacity measurements as it can potentially be a resource intensive activity. We've heard customers want to do chargeback once a month and so you can create schedules for such measurements either via the CLI or our multi-system management console - DDMC.


Does that help?


Regards,

+ Mayank



0 Kudos
dynamox
6 Thallium

Re: Physical Capacity Measurement on Data Domain

Thank you Mayank, a couple more questions

1) to expand on your reply, i want to make sure i understand it correctly. Let's say i have this setup:

Tenant ABC

     \mtree-apples

     \mtree-oranges

If customer "apples" writes to mtree-apples, let say they write 1TB of medical images with absolutely no dedupe.

If customer "oranges" write to mtree-oranges,  and copies exact same data as customer "apples".

So now i run my report, will it report that each customer is using 1 TB ?

2) You said that my unit of measure can be a file, set of files.  Can you explain how does that work ?  Let's assume this scenario:

Tenant ABC

     \mtree-apples

          \applicationXYZ

                    \masterdb

                    \applicationdb

                    \tempdb


How can i find out how much physical space masterdb folder is consuming for applicationXYZ ?  Is it going to determine how much deduplication happened within the masterdb folder itself ?


3) If i need measures at folder levels, can those be scheduled or only higher level measures at mtree,tenant units, tenants ?


4) How does compression play into all of these scenarios ?



Thank you very much for your time.

0 Kudos
mvvasa
1 Copper

Re: Re: Physical Capacity Measurement on Data Domain

Happy Friday, dynamox! My apologies for the delayed response. My inline responses:

1) to expand on your reply, i want to make sure i understand it correctly. Let's say i have this setup:

Tenant ABC

     \mtree-apples

     \mtree-oranges

If customer "apples" writes to mtree-apples, let say they write 1TB of medical images with absolutely no dedupe.

If customer "oranges" write to mtree-oranges,  and copies exact same data as customer "apples".

So now i run my report, will it report that each customer is using 1 TB ?

[MV] If you run a PCM report on mtree-apples, it will report 1 TB. If you run a similar report on mtree-oranges it will also show 1 TB. If its exactly same data and perfectly dedupes, in reality mtree-oranges will not consume any space, though from a PCM reporting perspective it will show that mtree-oranges has consumed 1 TB of physical space on DD as that's what you will want to chargeback the tenant with for that independent MTree.


Its important to note that you can run a report at the "Tenant ABC" level, which will also report 1 TB given that the 2 mtrees underneath have identical data. If you had different data between mtree-apples & mtree-oranges and only some commonality (deduplication), then the tenant ABC will show more than 1 TB.

2) You said that my unit of measure can be a file, set of files.  Can you explain how does that work ?  Let's assume this scenario:

Tenant ABC

     \mtree-apples

          \applicationXYZ

                    \masterdb

                    \applicationdb

                    \tempdb


How can i find out how much physical space masterdb folder is consuming for applicationXYZ ?  Is it going to determine how much deduplication happened within the masterdb folder itself ?


[MV] You can do that using the CLIs shown on page 8 of the whitepaper.


3) If i need measures at folder levels, can those be scheduled or only higher level measures at mtree,tenant units, tenants?

[MV] yes you can.


4) How does compression play into all of these scenarios ?

[MV] Its not different from a PCM reporting perspective. Data Domain uses both deduplication and compression algorithms to give the best storage optimization.


I hope this helps.

+ Mayank

0 Kudos
FebinK
1 Copper

Re: Physical Capacity Measurement on Data Domain

Hi mvvasa

Is there any way I could send PCM reports to my mail id?

I have created PCM schedule for my Mtree. But i couldn't figure out how to get it sent to my mail.

Tried using "compression physical-capacity-measurement" commands as well to see if any option, but couldn't meet my requirement.

Thanks,

Febin @

0 Kudos