VNX5400 Block DeDup experiences? performance issues?

Question

Hey, anybody else doing large scale DeDuplications with VNX5400? We enabled DeDup on most our LUNs in single pool config when we started using the system in December. This seems to cause a lot metadata to be generated and with that causing lot of extra IO.. which is problem, especially on a pool without any SSD - we are considering that now, SSD should help with metadata processing ;D DeDup itself works really good, space savings are huge on VMware environment! EMC has now published some new best practices&whitepapers regarding dedup, I recommend that all who want to use Dedup reads those through, quite a few limitations have been discovered. sk

benconrad1 · Answer

We'll be deduping on some 7600's in a few weeks. My bet is that it's not the metadata rather the crawling of all the new blocks that have been written to try to find duplicate blocks. That eats up disk IO. In the background is always crawling as long as the dedup container has something like >64GB of new changes to process.

My experience with VNX Snapshot consolidation has been very poor on the VNX1, I'm hoping that VNX Snaps and Dedup on the VNX 2 will impress. When it comes down to it I don't think the EMC sizing tools take into account the overhead of dedup, vnx snaps and thin overhead.

Ben

sk_ · Answer

Yep, the crawling is another big influence here, hard to separate those workloads.

We have also noticed that the pool balancing is not working properly, this may or may not relate to deduplication being used.

eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold.

I assume it _should_ spread out to all disks in the same tier?

sk_ · Answer

Follow up..

after trying to solve this with support, we got a statement saying that the dedup processing isn't quite optimal in the current code and that we should cut down the usage of dedup, limit to archives and such LUNs with lower I/O.

Adding SSD to the pool could help, but not sure how much, and as u know, that really can't be tested as u can't get the disks out from the pool anymore..

benconrad1 · Answer

Do you mind pasting in some of the info regarding dedup that you got back from support?  Or send to me directly via the forums message feature? Ben

benconrad1 · Answer

'eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold.' We had this problem on a VNX 5300 (R32, patch 201) in summer '13.  Support said it was odd, that they could not figure out why it was happening and the only fix was for me to destroy the pools and recreate.  You can run a MLU displayslice script that support can look at to see if the slices are distributed properly.

sk_ · Answer

No details were given.

But our problems escalated yesterday, our dedup pool luns went OFFLINE. It took 17 hours for EMC to run the lun recovery jobs and bring them back to online. RootCause still unknown.

Really can't recommend dedup usage to anyone at this point!

sk_ · Answer

Apparently our LUNs OFFLINE issue was caused by DeDup bug in previous OE release. We had upgraded to the latest OE but it doesn't help if the data was messed up by earlier OE version :/

benconrad1 · Answer

OMG.  Hey EMC, get your act together EMC and get your developers on the same page.

benconrad1 · Answer

There were no dedup fixes in most recent release from Feb 28th:

VNX Operating Environment for Block 05.33.000.5.51,

VNX Operating Environment for File 8.1.2.51,

EMC Unisphere 1.3.2.1.0051

Fixed problems

VNX Operating Environment for Block 5.33.000.5.051, VNX Operating

Environment for File 8.1.2.51, EMC Unisphere 1.3.2.1.0051

VNX Operating Environment (OE) for Block related

There are no fixed problems in this release.

Block Compression

There are no fixed problems in this release.

Block Deduplication

There are no fixed problems in this release.

JonasGustavsson · Answer

Hi, what was the version that had fixes for dedup ? regards, Jonas

sk_ · Answer

Those fixes were already in .38 or even .35, but the OE fix doesn't correct actual problems in the data already created by the defected code - for that you need to migrate the affected LUNs to another pool or run LUN recovery.

If you have been running block dedup with OE before .35 code, open case and ask them to verify your dedup data.

sk

nathaniel_fagun · Answer

I have a customer seeing a significant load on the SPs and disks which are using dedupe. I know there is more overhead for dedupe but they are seeing a lot more overhead then what I would consider to be "normal". They started on FLARE 05.33.000.5.035 and have recently upgrade to 05.33.000.5.51. Though the pools/LUNs using dedupe were enabled before the code upgrade to 05.33.000.5.51. I have been looking through the release notes for 05.33.000.5.51 but I am unable to find any mention of a fix or issue with block level dedupe in it.

Do you have anymore information where I can read more about this "fix" or documentation on an issue with block level dedupe on the new VNX series?

sk_ · Answer

Those fixes were in earlier code already and fixed the datacorruption issues - performance issues are still there :/

Do you have SSD in the pool? If not, adding SSD will help as dedup metadata handling will be promoted to highest tier.

sk

VNX

Was this post helpful?