Tweet this document:
Follow us on Twitter:
Let’s talk about All Flash Array Deduplication!
I wanted to start a discussion here on the EMC Oracle Community, to better understand what happens with different Deduplication implementations in All Flash Arrays when used with Oracle databases.
There’s a lot been written about the merits of Always-On In-line Data Reduction or otherwise, so I am not going to go into that here. This story needs to start with introducing Data Reduction – with EMC XtremIO this is much more than just Deduplication and Compression, but those are the features receiving initial attention. Thankfully Snapshots and efficient Metadata Management follow.
NOTE: Very pleased if this discussion broadens out to inform us all on how these features are implemented on other All Flash Arrays, particularly as there is occasional disagreement about different implementations.
Let’s get started…
To my knowledge, while approaches to deduplication may differ, the main question is whether the deduplication is attempted using 512 Byte (512B), 4 Kilo-Byte (4KB) or 8 Kilo-Byte (8KB) blocks.
At EMC, we clearly state that XtremIO does not deduplicate Oracle data blocks for a single instance of a database, only for subsequent copies or snapshots would an immediate and substantial deduplication benefit occur. However, significant benefit can be achieved with compression, particularly for empty data blocks.
As I have heard conflicting opinions, before engaging in this discussion I felt that I needed to inform myself on the structure of the Oracle data block. Recently, I was pleased to attend a session by Martin Bach at the UKOUG Tech14 Conference in Liverpool, where he mentioned that an Oracle data block contains a “Cache Header and Tail”. Following up with Martin, he kindly provided this link (Oracle Internals Notes - Cache Header and Tail).
While that Note is from 2007, the Oracle data block has not fundamentally changed since that time. Given the following comment - “The cache layer reads and maintains a 20-byte header and 4-byte tail on each data block”, it would be reasonable to conclude zero benefit for 4KB/8KB deduplication with similar sized Oracle data blocks, primarily due to the presence of unique “Cache Header and Tail” information. Similarly, it would be reasonable to conclude that 512B deduplication would benefit from empty block deduplication within 4K/8K Oracle data blocks.
Deduplication of 512B blocks implies much more metadata (8-16x) would need to be managed and more CPU cycles necessary for efficient performance, is this an issue or just an example of more competing vendor FUD?
If we are using 4KB/8KB deduplication, could we be missing out on more significant deduplication on single, new and empty databases. Or is this adequately compensated for by having to do less deduplication and better done by exploiting compression algorithms?
What do you think about it, have I answered your questions or just given rise to more questions that need to be answered?
Please feel free to make considered comment or ask further questions below, by Replying to this message - I am hoping for a lively and positive discussion.
Another thing to think about is the impact of not having all of the metadata in DRAM. If you have to go to disk in order to read metadata you will introduce additional latency by needing to go to the disk twice to perform one operation.
You also mentioned inline... this is an important factor as well. If you run some deduplication inline and then different deduplication during a post process... maybe you need some deduplication of your deduplication process.