Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

PowerScale OneFS Web Administration Guide

Deduplication overview

SmartDedupe enables you to save storage space on your cluster by reducing redundant data. Deduplication maximizes the efficiency of your cluster by decreasing the amount of storage that is required to store multiple files with identical blocks.

The SmartDedupe software module deduplicates data by scanning a PowerScale cluster for identical data blocks. Each block is 8 KB. If SmartDedupe finds duplicate blocks, SmartDedupe moves a single copy of the blocks to a hidden file called a shadow store. SmartDedupe then deletes the duplicate blocks from the original files and replaces the blocks with pointers to the shadow store.

Deduplication is applied at the directory level, targeting all files and directories underneath one or more root directories. SmartDedupe not only deduplicates identical blocks in different files, it also deduplicates identical blocks within a single file.

Before you deduplicate a directory, you can get an estimate of the amount of space you can expect to save. After you begin deduplicating a directory, you can monitor the amount of space that deduplication is saving in real time.

To enable deduplicating two or more files, the files must have the same disk pool policy ID and protection policy. If either or both of these attributes differ between two or more identical files, or files with identical 8 K blocks, the files are not deduplicated.

Because it is possible to specify protection policies on a per-file or per-directory basis, deduplication can be further affected. Consider the example of two files, /ifs/data/projects/alpha/logo.jpg and /ifs/data/projects/beta/logo.jpg. Even if the logo.jpg files in both directories are identical, they would not be deduplicated if they have different protection policies.

If you have activated a SmartPools license on your cluster, you can also specify custom file pool policies. These file pool policies might result in identical files or files with identical 8 K blocks being stored in different node pools. Those files would have different disk pool policy IDs and would not be deduplicated.

SmartDedupe also does not deduplicate files that are 32 KB or smaller, because doing so would consume more cluster resources than the storage savings are worth. The default size of a shadow store is 2 GB. Each shadow store can contain up to 256,000 blocks. Each block in a shadow store can be referenced up to 32,000 times.


Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\