We are testing out an XtremIO appliance and looking to do some testing around possible de-duplication savings.
There does not appear to be an obvious way to get the de-duplication reduction / savings at any layer other than the whole cluster. Does anyone know if there is a way to pull statistics at a folder or LUN level to see how de-duplication is doing at a certain level?
Looking at possible mixed environments/workloads and detecting a "badly de-duping" environment.
XtremIO does deduplication and compression at the global array level and not at the volume level. We therefore report the dedupe/compression ratio at the cluster level only.
Think a little about this. If you have two volumes A and B, and write 4 blocks into the volumes. Block 1 is unique and goes into volume A. Block 2 is unique and goes into Volume B. Blocks 3 and 4 are duplicates and are written into both Volume A and Volume B.
Volume A has 3 logical blocks and 1 physical
Volume B has 3 logical blocks and 1 physical
Two blocks are shared between volume A and B.
How would you account for the physical space of blocks 3 and 4? You can’t assign it to either volume, or divide it between them. Both methods are inaccurate. What if you deleted blocks 3 and 4? Where would you then assign the capacity? It could suddenly cause a volume’s consumption to jump.
This is like a “divide by zero” math problem. The answer is undefined.
You can optionally, use a tool to scan the content of a particular volume and see what the dedupe and compression ratio would be if that specific volume was stored on XtremIO. Reach out to your EMC account team for access to this tool.
Correct, but keep in mind I was asking for dedupe rate not allocation. Your description is exactly what it would be nice to see per LUN. Volume A has 3 logical blocks and 1 is shared meaning it has a 33% reduction rate
I understand it is an interesting problem tracking that per volume, but certainly something that could be done or could be run via a utility after the fact perhaps.
As for the other utility that is useful for sizing, but not really for looking into sizing after the fact or for new data. Really would be nice to have a way to do it on the XtremIO not the host side for trouble-shooting purposes.
In my previous example, if Volume B is deleted, then the dedupe ratio for Volume A would drop down to 0% (action on one volume affects the space efficiency of another volume in this model). This is not correct and misrepresents the data efficiency at the volume level.
We are thinking of some other ways to help resolve this and convey this information to you. Stay tuned.
Actually I'd argue that is correct because after that deletion the dedupe rate for the other volume is no longer very good
I agree and understand that omething like a dedupe rate is always going fluctuate, but having some way to see how well a particular volume is compressing would be helpful in a context of time. For instance if you have a 6TB volume with 0x dedupe rate that does not require a lot of IO then you are wasting some very expensive storage.
That is the kind of situation that it would be nice to detect especially when you have volumes from virtual servers or other sources that may not be easy to map directly to a single application and potentially have a lot of these devices.
The context of my question is we are trying to see if we can cost justify some of these units based on compression so we have an interest in understanding what compression rates we are getting from different environments.
I will take this requirement to the Engineering team, but in the meantime you can estimate the dedupe and compression rates for a specific volume by running the scanning tool on that specific volume only. Have your EMC account team run this for you.
also in the last version (fw 4.0) there is no way to hava the dedup/compression ratio for a single LUN ... it's no good, because other vendors (Pure, NetApp) released an appropriate commandlet to get this info 😞