Start a Conversation

Unsolved

This post is more than 5 years old

P

3015

June 8th, 2011 06:00

Deduplication gone wild.

We have enabled Deduplication on a CIFS Shared Folder - and are a bit perplexed by the information provided about the Deduplication.  Can anyone help better explain these numbers?


On the Shared Folder Capacity Tab:
Size: 3 TB

Current Allocation: 788.825 GB
Used: 549.123 GB
Maximum Size: 5.796 TB
Thin is Enabled

On the Deduplication Tab:
State: Enabled
Status: Idle
Last Scan: 2011-06-08 09:08
Files Deduplicated: 678 (1%)
Saved 0% (64.9 GB before, 274.1 GB after)


Huh?  Did turning Deduplication on increase the amount of storage required?  Are the Before and After numbers somehow reversed?


We have 1 shared folder on this share - and Windows reports it's size as 547 GB.

Does anyone have any insight that can help us understand what is going on?

8.6K Posts

June 8th, 2011 08:00

If you look at it from the Windows side you should see two values – Size and Size on Disk

What are they ?

8.6K Posts

June 8th, 2011 08:00

Also – look at the dedupe stats after its been fully run at least once

Looking at the stats when its done only 1% doesn’t give you good data

June 8th, 2011 09:00

Sure thing:

Size:  547 GB (587,616,381,115 bytes)
Size on disk: 547 GB (588,232,499,200 bytes)

So it's currently reporting about 587 MB larger Size on Disk.


I didn't know the Filed Deduplicated: 678 (1%) was an indication of the deduplication scan process.  I perhaps wrongly assumed the idle status of the scan indicated completion.


Deduplication has been enabled on the share for a couple of weeks now, although we did just start to copy data onto this share over the last couple of days.

I do see logging from the DART_DEDUPE source - indicating a couple of issues - timestamped to yesterday afternoon.

13:10940002 - Deduplication scan 42 failed due to lack of space or other system resource issue.

and

13:10940066 - Deduplication scan /some/file/path/here.jpg failed due to a lack of protection space.

Even though protection space is set to increase automatically - and there is plenty of space in the pool, I manually increased it - and forced a rescan.  Perhaps when that completes, my numbers will start making sense.


In addition to increasing the protection pool, I did check the SP performance chart - steady load peaks around 35% during the majority of our recent file additions to the shared folder - with a couple of peaks at 5pm yesterday near 70%.  These peaks don't match with the log messages indicating a lack of system resources, so I'm hopeful the protection pool size was the culprit.

June 8th, 2011 12:00

An Update:

Forced rescans would stop within about 5 minutes of starting them - repeating the same errors mentioned above in the log - although the particular files would change.


We are, however, making some progress.  Even though the Deduplication Status is still showing Idle - it has started to increase the number of files Deduplicated, and it now reporting 13% - with a much higher 8,493 file count.


The setting I've updated which has most likely made the difference, was un-checking Auto-Adjust Protection Size on the Protection Size tab for the Shared Folder details.

2 other minor changes which could have potentially had a delayed impact were:

- Increasing the Size of the Protection (although the inital size of 128GB was well below the current usage of 26GB)

- Adding in and Excluded Path on the Deduplication Tab (I suppose a configuration change there could possibly trigger a refresh of all the settings for the deduplication process).  It's also worth noting the Apply Changes button is enabled everytime I view this particular tab - even without making any changes, which seems like a bug.  Other Apply Changes buttons only become enabled when there is a change available to apply.


I'll post another update when the scan reports 100%.  Hopefully the numbers will start making sence then.


If anyone else finds online support resources on the log entries mentioned, please let me know - I couldn't find anything:
Source: DART_DEDUPE 
Event ID:  13:10940002 and 13:10940066

8.6K Posts

June 9th, 2011 06:00

I you by change know which files / directories you want to be compressed you could also force it throught the Windows “compress files” property

That one will do it immediately outside of a dedupe scan

June 9th, 2011 13:00

Another update:

Still not fully resolved yet.  However, I've perhaps learned a few more things - most importantly - the System Logs view in Unisphere seems to favor returning you to your previous point in the log file, which is not always the same as the start of the log file.  Be sure to use those navigation buttons when searching for log file entries.


Yesterday's scan eventually ended with the same fate at as the previous attempts - system resource issues.

I have, however, kicked of another scan after making some further adjustments - and for the first time, have seen a Status entry on the Deduplication tab indicating In Progress.  Currently it's at 41%.

The only configuration change was to reduce the settings on the Storage folder - to get the Subscription rate under the alert threshold.  Or perhaps more importantly, under 105% - which is where is started the day.   Now that it's down to 78% and the alert threshold has been raised to 80% - I forced another rescan.  So far, so good.

It's also worth noting we are still steadily loading up the shared folder with more data.  It's entirely plausible that too many large data changes could negatively effect the deduplication scan.  Although, I have nothing other than pure conjecture to back up that theory.


For those interested, here are the current stats on the Deduplication tab:


Status: In Progress (43% Complete)
Last Scan: 2011-06-09 16:54
Files Deduplicated: 8493 (13%)
Saved 0% (64.9 GB before, 1.0 TB after)

I'm still looking forward to these numbers making sense once a full error-free scan has been completed.

June 9th, 2011 13:00

Thanks.  At the moment, I'm not too worried about getting compressed file property set - more just trying to get my head around the Deduplication functionality.

6 Posts

July 12th, 2011 02:00

I've got a similar problem. I've got around 500gb in a cifs share of 700gb total available, i know for a fact there are duplicate directories and infact the VNXe (3300) is reporting  "files Deduplicated" as 313 but it says (0%) next to it and also says underneath Saved 0% 546.9GB before 557.2GB after. There are no snapshots on this share so im a bit puzzeled why it says its found duplicates but not done anything about it.

I'm on a VNXe3300 with SP3 applied.

Cheers

Andy

1 Message

July 12th, 2011 10:00

I have the same problem.  This is what I have discovered, but not resolved.

After speaking with support they claim that files will be deduplicated as follows

Looking at the following file properties of each file in your share.

"Date Accessed" must be more than 30 days old

"Date Modified" must be more than 30 days old

Scan interval is 7 days.  (I am assuming this is aside from a "Forced Scan")

I have 3 shares with deduplication enabled and 1 that is working properly.

All 3 shares have the auto adjust protection size enabled with plenty of space available

All 3 shares have snapshots enabled with the same snapshot schedule

All 3 shares show a date for last scan on the deduplication tab and show idle

Share 1 with ~160GB of Data

Files Deduplicated: 10957 (53%)

Saved 0% (63GB before, 177GB after)

-Thin Disk

Share 2 with ~112GB of Data (This one is working properly)

Files Deduplicated: 121864 (24%)

Saved 13% (118GB before, 101GB after)

-Thin Disk

-Have replication setup to another VNXe box...

Share 3 with ~162GB of Data

Files Deduplicated: 2806 (2%)

Saved 0% (163.9GB before, 164.3GB after)

-Thick Disk (Was testing to see if this would have an effect)

When trying to look through the logs it is cumbersome because of the lack of a search function.  Going to find a way to grab the logs another way to look through them for deduplication errors.

1 Message

October 28th, 2011 12:00

Has anyone found a resolution to this problem yet?  I am a relatively new user to EMC, and we are encountering these same exact issues with the deduplication process.  If left to its own devices (more than a day)... nothing ever changes.  Forcing the scan seems to only generate more errors of the same kind.  Could someone explain to me how this is supposed to work?  Have I set something up incorrectly, or have we just not waited long enough for these processes to finish?

Out of 5 shares... three of them are reporting deduplication errors "failed due to lack of protection space".

1) 100GB ~ 78.312GB allocated

     Files Deduplicated: 45648 (79%)

     Saved 5% (54.2GB before, 51.4GB after)

     Status:  Working Properly

2) 80GB ~ 70.500GB allocated

     Files Deduplicated: 753 (75%)

     Saved 0% (516MB before, 51.8GB after)   ???

     Status:  Not Working

3) 6GB ~ 2.7GB used

     Files Deduplicated: 7236 (28%)

     Saved 24% (3.6GB before, 2.7GB after)

     Status:  Working Properly

4) 32GB ~ 24.392GB used

     Files Deduplicated: 0 (0%)

     Saved 0% (123MB before, 24.3GB after)   ???

     Status:  Not Working

5) 400GB ~ 101.750GB allocated

     Files Deduplicated: 0 (0%)

     Saved 0% (0 before, 0 after)   ???

     Status:  Not Working /  Force Rescan Has No Effect

October 28th, 2011 13:00

We still see similar errors a few times a month. 

However, overall we have more reasonable numbers being reported - and the graph actually works now.

Deduplicated (633,385) 64% - and a space savings of 22% good for about 400MB.


As for why this happens?  I'm still a bit unsure.  It appears very similar to an issue where the deduplication scan  doesn't auto-adjust the Protection Size properly (even though it could - plenty of room) so it stops running the de-duplication, because it doesn't think it can do it safely.

November 2nd, 2011 07:00

It is a health/reporting issue which is being assessed for inclusion in a future release.

No Events found!

Top