Start a Conversation

Unsolved

This post is more than 5 years old

4957

March 20th, 2015 12:00

Can you measure dedupe by LUN or folder?

We are testing out an XtremIO appliance and looking to do some testing around possible de-duplication savings.

There does not appear to be an obvious way to get the de-duplication reduction / savings at any layer other than the whole cluster.  Does anyone know if there is a way to pull statistics at a folder or LUN level to see how de-duplication is doing at a certain level?

Looking at possible mixed environments/workloads and detecting a "badly de-duping" environment.

Thanks.

727 Posts

March 21st, 2015 22:00

XtremIO does deduplication and compression at the global array level and not at the volume level. We therefore report the dedupe/compression ratio at the cluster level only.

Think a little about this.  If you have two volumes A and B, and write 4 blocks into the volumes.  Block 1 is unique and goes into volume A.  Block 2 is unique and goes into Volume B.  Blocks 3 and 4 are duplicates and are written into both Volume A and Volume B.

So now:

Volume A has 3 logical blocks and 1 physical

Volume B has 3 logical blocks and 1 physical

Two blocks are shared between volume A and B.

How would you account for the physical space of blocks 3 and 4?  You can’t assign it to either volume, or divide it between them.  Both methods are inaccurate.  What if you deleted blocks 3 and 4?  Where would you then assign the capacity?  It could suddenly cause a volume’s consumption to jump.

This is like a “divide by zero” math problem.  The answer is undefined.

727 Posts

March 21st, 2015 22:00

You can optionally, use a tool to scan the content of a particular volume and see what the dedupe and compression ratio would be if that specific volume was stored on XtremIO. Reach out to your EMC account team for access to this tool.

385 Posts

March 23rd, 2015 07:00

Correct, but keep in mind I was asking for dedupe rate not allocation.  Your description is exactly what it would be nice to see per LUN.  Volume A has 3 logical blocks and 1 is shared meaning it has a 33% reduction rate

I understand it is an interesting problem tracking that per volume, but certainly something that could be done or could be run via a utility after the fact perhaps.

As for the other utility that is useful for sizing, but not really for looking into sizing after the fact or for new data.  Really would be nice to have a way to do it on the XtremIO not the host side for trouble-shooting purposes.

727 Posts

March 23rd, 2015 12:00

In my previous example, if Volume B is deleted, then the dedupe ratio for Volume A would drop down to 0% (action on one volume affects the space efficiency of another volume in this model). This is not correct and misrepresents the data efficiency at the volume level.

We are thinking of some other ways to help resolve this and convey this information to you. Stay tuned.

385 Posts

March 24th, 2015 05:00

Actually I'd argue that is correct because after that deletion the dedupe rate for the other volume is no longer very good 

I agree and understand that omething like a dedupe rate is always going fluctuate, but having some way to see how well a particular volume is compressing would be helpful in a context of time.  For instance if you have a 6TB volume with 0x dedupe rate that does not require a lot of IO then you are wasting some very expensive storage.

That is the kind of situation that it would be nice to detect especially when you have volumes from virtual servers or other sources that may not be easy to map directly to a single application and potentially have a lot of these devices.

The context of my question is we are trying to see if we can cost justify some of these units based on compression so we have an interest in understanding what compression rates we are getting from different environments.

727 Posts

March 24th, 2015 06:00

Understood.

I will take this requirement to the Engineering team, but in the meantime you can estimate the dedupe and compression rates for a specific volume by running the scanning tool on that specific volume only. Have your EMC account team run this for you.

January 5th, 2016 02:00

Hi All,

also in the last version (fw 4.0) there is no way to hava the dedup/compression ratio for a single LUN ... it's no good, because other vendors (Pure, NetApp) released an appropriate commandlet to get this info :-(

Monardo G.A.

727 Posts

January 5th, 2016 09:00

If dedupe and compression is being done globally across all the data on the array (as it happens in XtremIO), you cannot determine a correct (or useful) dedupe/compression ratio at the LUN level.

For example, if the same data exists in two different volumes on the array, which volume do you give the dedupe benefit to? The volume where the data was written later? As you can see, it becomes arbitrary at that point.

What happens if that data is deleted from one of the volumes – do you suddenly remove the dedupe ratio for the second volume? You will start seeing data efficiency metric changes in one volume because of a change in some other volume.

Any vendor that shows volume level dedupe/compression ratios (when they claim to be implementing these data efficiency features at the array level) is at best a “guess".

January 7th, 2016 00:00

Dear Avi,

in some other post You talked about a "Mitrend tool" or "ask EMC support to know this ratio"... So, it's possible or not ?? It makes sense or not ?

About the other vendor, HP, NetApp, PureStorage can shows this ratio and EMC can't... but other vendors "guess" about it... or maybe only EMC can't descent so deeply in detail ?

bye

Monardo G.A.

727 Posts

January 7th, 2016 14:00

Mitrend tool will allow you to get an idea of what the data efficiency ratios would be if a particular set of data was stored on XtremIO. If you scan only a particular volume - then you get the data efficiency for the data in that volume. But that is not going to account for the dedupe that you would see across multiple volumes in the real world. Your EMC account team should be able to help with more details on this tool.

As mentioned earlier, we dont report on the per volume dedupe and compression numbers in the XtremIO GUI. This is something that we are evaluating internally.

5 Practitioner

 • 

274.2K Posts

February 24th, 2016 02:00

Hi!

You say there is a tool that can be downloaded....

What is the name of the tool you mention and where can I download it?

//Kjell

February 24th, 2016 03:00

Sulan wrote:

Hi!

You say there is a tool that can be downloaded....

What is the name of the tool you mention and where can I download it?

//Kjell

Mitrend For EMC XtremIO | Itzikr's Blog

No Events found!

Top