We'll be deduping on some 7600's in a few weeks. My bet is that it's not the metadata rather the crawling of all the new blocks that have been written to try to find duplicate blocks. That eats up disk IO. In the background is always crawling as long as the dedup container has something like >64GB of new changes to process.
My experience with VNX Snapshot consolidation has been very poor on the VNX1, I'm hoping that VNX Snaps and Dedup on the VNX 2 will impress. When it comes down to it I don't think the EMC sizing tools take into account the overhead of dedup, vnx snaps and thin overhead.
Yep, the crawling is another big influence here, hard to separate those workloads.
We have also noticed that the pool balancing is not working properly, this may or may not relate to deduplication being used.
eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold.
I assume it _should_ spread out to all disks in the same tier?
after trying to solve this with support, we got a statement saying that the dedup processing isn't quite optimal in the current code and that we should cut down the usage of dedup, limit to archives and such LUNs with lower I/O.
Adding SSD to the pool could help, but not sure how much, and as u know, that really can't be tested as u can't get the disks out from the pool anymore..
"eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold."
We had this problem on a VNX 5300 (R32, patch 201) in summer '13. Support said it was odd, that they could not figure out why it was happening and the only fix was for me to destroy the pools and recreate. You can run a MLU displayslice script that support can look at to see if the slices are distributed properly.
But our problems escalated yesterday, our dedup pool luns went OFFLINE. It took 17 hours for EMC to run the lun recovery jobs and bring them back to online. RootCause still unknown.
Really can't recommend dedup usage to anyone at this point!
Apparently our LUNs OFFLINE issue was caused by DeDup bug in previous OE release. We had upgraded to the latest OE but it doesn't help if the data was messed up by earlier OE version :/
Those fixes were already in .38 or even .35, but the OE fix doesn't correct actual problems in the data already created by the defected code - for that you need to migrate the affected LUNs to another pool or run LUN recovery.
If you have been running block dedup with OE before .35 code, open case and ask them to verify your dedup data.
I have a customer seeing a significant load on the SPs and disks which are using dedupe. I know there is more overhead for dedupe but they are seeing a lot more overhead then what I would consider to be "normal". They started on FLARE 05.33.000.5.035 and have recently upgrade to 05.33.000.5.51. Though the pools/LUNs using dedupe were enabled before the code upgrade to 05.33.000.5.51. I have been looking through the release notes for 05.33.000.5.51 but I am unable to find any mention of a fix or issue with block level dedupe in it.
Do you have anymore information where I can read more about this "fix" or documentation on an issue with block level dedupe on the new VNX series?
benconrad1
105 Posts
0
January 23rd, 2014 06:00
We'll be deduping on some 7600's in a few weeks. My bet is that it's not the metadata rather the crawling of all the new blocks that have been written to try to find duplicate blocks. That eats up disk IO. In the background is always crawling as long as the dedup container has something like >64GB of new changes to process.
My experience with VNX Snapshot consolidation has been very poor on the VNX1, I'm hoping that VNX Snaps and Dedup on the VNX 2 will impress. When it comes down to it I don't think the EMC sizing tools take into account the overhead of dedup, vnx snaps and thin overhead.
Ben
sk_
1 Rookie
•
104 Posts
0
January 30th, 2014 00:00
Yep, the crawling is another big influence here, hard to separate those workloads.
We have also noticed that the pool balancing is not working properly, this may or may not relate to deduplication being used.
eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold.
I assume it _should_ spread out to all disks in the same tier?
sk_
1 Rookie
•
104 Posts
0
February 16th, 2014 22:00
Follow up..
after trying to solve this with support, we got a statement saying that the dedup processing isn't quite optimal in the current code and that we should cut down the usage of dedup, limit to archives and such LUNs with lower I/O.
Adding SSD to the pool could help, but not sure how much, and as u know, that really can't be tested as u can't get the disks out from the pool anymore..
benconrad1
105 Posts
0
February 18th, 2014 06:00
Do you mind pasting in some of the info regarding dedup that you got back from support? Or send to me directly via the forums message feature?
Ben
benconrad1
105 Posts
0
February 18th, 2014 06:00
"eg. one LUN was having lot of write IO coming to this morning, I looked at the last Hour disk heatmaps from M&R and it shows that only 5 disks (one RG) of the pool took all the IO, others were cold."
We had this problem on a VNX 5300 (R32, patch 201) in summer '13. Support said it was odd, that they could not figure out why it was happening and the only fix was for me to destroy the pools and recreate. You can run a MLU displayslice script that support can look at to see if the slices are distributed properly.
sk_
1 Rookie
•
104 Posts
0
February 21st, 2014 00:00
No details were given.
But our problems escalated yesterday, our dedup pool luns went OFFLINE. It took 17 hours for EMC to run the lun recovery jobs and bring them back to online. RootCause still unknown.
Really can't recommend dedup usage to anyone at this point!
sk_
1 Rookie
•
104 Posts
0
February 26th, 2014 01:00
Apparently our LUNs OFFLINE issue was caused by DeDup bug in previous OE release. We had upgraded to the latest OE but it doesn't help if the data was messed up by earlier OE version :/
benconrad1
105 Posts
0
February 26th, 2014 06:00
OMG.
Hey EMC, get your act together EMC and get your developers on the same page.
benconrad1
105 Posts
0
March 14th, 2014 07:00
There were no dedup fixes in most recent release from Feb 28th:
VNX Operating Environment for Block 05.33.000.5.51,
VNX Operating Environment for File 8.1.2.51,
EMC Unisphere 1.3.2.1.0051
Fixed problems
VNX Operating Environment for Block 5.33.000.5.051, VNX Operating
Environment for File 8.1.2.51, EMC Unisphere 1.3.2.1.0051
VNX Operating Environment (OE) for Block related
There are no fixed problems in this release.
Block Compression
There are no fixed problems in this release.
Block Deduplication
There are no fixed problems in this release.
JonasGustavsson
1 Message
0
March 14th, 2014 07:00
Hi, what was the version that had fixes for dedup ?
regards, Jonas
sk_
1 Rookie
•
104 Posts
0
March 14th, 2014 11:00
Those fixes were already in .38 or even .35, but the OE fix doesn't correct actual problems in the data already created by the defected code - for that you need to migrate the affected LUNs to another pool or run LUN recovery.
If you have been running block dedup with OE before .35 code, open case and ask them to verify your dedup data.
sk
nathaniel_fagun
10 Posts
0
April 18th, 2014 08:00
I have a customer seeing a significant load on the SPs and disks which are using dedupe. I know there is more overhead for dedupe but they are seeing a lot more overhead then what I would consider to be "normal". They started on FLARE 05.33.000.5.035 and have recently upgrade to 05.33.000.5.51. Though the pools/LUNs using dedupe were enabled before the code upgrade to 05.33.000.5.51. I have been looking through the release notes for 05.33.000.5.51 but I am unable to find any mention of a fix or issue with block level dedupe in it.
Do you have anymore information where I can read more about this "fix" or documentation on an issue with block level dedupe on the new VNX series?
sk_
1 Rookie
•
104 Posts
0
April 19th, 2014 02:00
Those fixes were in earlier code already and fixed the datacorruption issues - performance issues are still there :/
Do you have SSD in the pool? If not, adding SSD will help as dedup metadata handling will be promoted to highest tier.
sk