This post is more than 5 years old

2 Intern

 • 

259 Posts

1979

July 21st, 2009 10:00

data dedupe question

I understand that if I dedupe a filesystem, the deduped data gets written to savvol. If I then delete the deduped filesystem without re-hydrating it, will the deduped data that exists in savvol get deleted as well?

Thanks.

Jim

2 Intern

 • 

366 Posts

July 22nd, 2009 05:00

Hi,

actually, the deduplication engine does not USE the SavVol.
It uses a hidden area within the filesystem to copy the candidates and deduplicate/compress them.

What the manual states is :

IF you have checkpoints for this specific filesystem, the deduplication process will copy the changed blocks to the SavVol as it was a user write.

From "Using Celerra Data Deduplication" on "Planning application integration" secction :

Point-in-time views of the file system
The deduplication process releases space in the production file system immediately. However, it may cause blocks to be copied to the SnapSure save volume (SavVol) in the process. Deduplicating data associated with a file involves copying the data within the file system so it can be compressed as well as single instanced. Since SnapSure checkpoints copy changed blocks to the SavVol on first write, the blocks that are deduplicated may need to be copied to the SavVol in order to preserve a previous checkpoint point-in-time view of the file system.
These blocks are freed when the corresponding checkpoint gets deleted or refreshed and are then available for re-use by other checkpoints. How many blocks will need to be copied to the SavVol during the deduplication process is a function of how full the file system is, the rate of change in it, and so on, and therefore is difficult to predict. By default the system is configured to abort deduplication operations on a file system before it causes the SavVol to extend. This avoids the SavVol expanding due to deduplication activity. If the deduplication process is aborted in this way, an alert is generated that explains what happened. The Celerra administrator can choose to extend the SavVol or simply let the deduplication process execute again on its next scheduled run.




So, to answer your question, if you want to delete a deduped filesystem with checkpoints, you need to delete it's checkpoints first, so you won't have a SavVol for this filesystem when you are deleting it.



Gustavo Barreto.

674 Posts

July 21st, 2009 22:00

The deduped data gets writtem into the Filesystem it is deduping. So you can not delete a dedup savvol ...

2 Intern

 • 

259 Posts

July 22nd, 2009 06:00

thanks for the responses. so, this comes with things to be careful about - such as NDMP backups and checkpoints to be careful about.

If you used Avamar or Datadomain to dedupe the backup process, what would happen?

6 Operator

 • 

8.6K Posts

July 23rd, 2009 02:00

Celerra deduplication is transparent for any file-based backup - like the NDMP backup used by Avamar or Data Domain - so it will work the same way as without

674 Posts

July 23rd, 2009 03:00

Celerra deduplication is transparent for NDMP (type tar and dump) but not VBB

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

July 23rd, 2009 05:00

Peter,

can you please elaborate how dedupe changes backup process with VBB option ?

Thank you

674 Posts

July 23rd, 2009 06:00

PAX-based (tar, dump) NDMP backups, which must reduplicate the files during the backup process, will be slowed when backing up deduplicated files. This will be particularly noticeable for small files.

As result, files are not deduped writen to backup medium (tape).

VBB:
Celerra deduplication-enabled file systems can be backed up using Celerra Volume Based
Backup (VBB) and restored in full by using the FDR method. However, a single file
restore or a file-by-file restore of deduplicated files from VBB backups is not supported
and will be rejected by the Celerra.

As result, FS is deduped writen to backup medium (tape).

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

July 23rd, 2009 08:00

PAX-based (tar, dump) NDMP backups, which must
reduplicate the files during the backup process, will
be slowed when backing up deduplicated files. This
will be particularly noticeable for small files.

As result, files are not deduped writen to backup
medium (tape).


so every time i run NDMP backup, the whole file system gets re-duplicated and dedupe process has to start from scratch ?

2 Intern

 • 

366 Posts

July 23rd, 2009 11:00

Hi,

No.The files are reduplicated on the tape.

Gustavo Barreto.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

July 23rd, 2009 13:00

Gustavo,

can you please explain what do you mean by "files are reduplicated on the tape" ?

6 Operator

 • 

8.6K Posts

July 24th, 2009 00:00

basically right now files will be uncompressed and "de-single-instanced" in memory before they get put on tape by NDMP PAX

so a NDMP PAX backup of a deduplicated file system takes a much space on tape as it would be without deduplication

that is planned to change in the future - but you need to talk to your local EMC technical contact for roadmap or beta information

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

July 24th, 2009 04:00

basically right now files will be uncompressed and
"de-single-instanced" in memory before they get put
on tape by NDMP PAX


in "memory", backup performance will probably suffer but as long as the whole file system does not get uncompressed/un-deduped it's a fair trade off i guess.

2 Intern

 • 

259 Posts

July 28th, 2009 05:00

Is there any way to tell a deduped file from a un-deduped file on a filesystem that has been configured for dedupe?
I'm familiar with the Centera and it placing a 'clock' symbol in the corner for those files aged to the Centera but I don't see anything similar to that with the Celerra dedupe feature.

2 Intern

 • 

259 Posts

July 28th, 2009 05:00

Thanks Peter.

674 Posts

July 28th, 2009 05:00

I think you are speaking of windows explorer.
When a file is archived to the centera Explorer is recognizing the stub and marking the file using this clock icon.
But the Celerra deduplication is transparent to the client, so Explorer is not able to recognize it.

The only way I know is is comparing the properties of the file, f.e. the "size" against the "size on disk"
No Events found!

Top