MartinDuble

51 Posts

1302

January 5th, 2010 11:00

We have just started Archiving our Celerra onto the Centera

It looks like we have archivied about 10 TB of data. On the Centera it looks like there is 10 TB of used space on it. My thought was that the Centera was single instance by default. Is there something we need to turn on for it. I can't beleive that out of the 10 TB of data nothing is the same.

Responses(4)

EMCDennis

124 Posts

0

January 5th, 2010 11:00

Besides SIS, your 10 TB of used capacity also accounts for protection. The amount of overhead consumed by protecting the data depends on the protection scheme (mirroring = 100% overhead, Parity = 1/6th overhead)

Regarding SIS - below is from some old notes I have

There are 2 settings that determine if/when SIS is done.

On the Centera – Storage Strategy:

– Capacity – Allow SIS to occur (not entirely true)

– Performace full - Enables fast addressing for blobs (larger than 256KB by default) and all C-Clips. Clips will be 53 characters

– Performance partial - Enables fast addressing for blobs only. Clips will be 27 characters

In the SDK – collision avoidance.

– Clip collision avoidance – method to ensure clipIDs are unique (GM naming)

– Blob collision avoidance – method to ensure blobIDs are unique (GM naming)

There are 3 naming schemes currently used by Centera

M

– MD5 hash

– 27 characters long

– Allows for SIS

– No timestamp in name

M++

– MD5 hash plus a 120 bit truncated SHA-256 hash of the data

– 53 characters long

– Allows for SIS

– No timestamp in name

– Differs from SDK format. Platform name starts with a G5

GM (w/ discriminator)

– MD5 hash plus a time stamp plus a header plus Unique Identifier (TS+Hdr+GUID)

– 53 characters long

– Does NOT allow for SIS

– Differs from SDK format. Platform name starts with a G4

There are two others but they are no longer used by current CentraStar code versions

– MG (53 characters)

– GM w/o discriminator (53 characters, Differs from SDK format. Platform name starts with a G6)

CLIPS

Storage Strategy	SDK Collision Avoidance ENABLED	SDK Collision Avoidance DISABLED
Capacity	GM (53 char)	M (27 char)
Performance Full	GM (53 char)	GM (53 char)
Performance Partial	GM (53 char)	M (27 char)

BLOBS (small – less than 256KB in size for each fragment. That means the object must be smaller than 256KB for CPM or 1.5MB for CPP)

Storage Strategy	SDK Collision Avoidance ENABLED	SDK Collision Avoidance DISABLED
Capacity	GM (53 char)	M++ (53 char)
Performance Full	GM (53 char)	GM (53 char)
Performance Partial	GM (53 char)	GM (53 char)

BLOBS (large – more than 256KB in size for each fragment. That means the object must be larger than 256KB for CPM or 1.5MB for CPP)

Storage Strategy	SDK Collision Avoidance ENABLED	SDK Collision Avoidance DISABLED
Capacity	GM (53 char)	M++ (53 char)
Performance Full	GM (53 char)	M++ (53 char)
Performance Partial	GM (53 char)	M++ (53 char)

Remember:

M and M++ allow for SIS - Shown in Green

GM does NOT allow for SIS

holgerjakob_c0722c

337 Posts

0

January 5th, 2010 23:00

Hi Martin

Dennis, thanks for the detailed explanation. It's great to see such a summary.

Martin, we manage 35 Centera systems and I hear your question quite often. From an admin point of view, there are two views to capacity on a system. The application point of view and the system point of view.

If you have archived 10TB of data, you would want your application point of view to show 10TB as you have archived that amount.

show pool capacity is the command in CenteraViewer CLI to show you the application point of view of archived data.

How much capacity your data that has been written to Centera really uses on Centera is shown with show capacity total and listed as protected user data.

If your Centera is in mirrored mode (show protection will tell you) you would expect protected user data to list 20TB as everything is stored twice. If you see less than 20TB that is what SIS saved.

If your Centera is in parity mode you would expect 11.6 TB (10*(7/6) for the 6+1 parity scheme) to be shown under protected user data. This is only true if 100% of your data is larger than the configured threshold for parity.

Assuming that 20% is small files smaller than then threshold of the default 256KB then it would be 2TB in mirrored and 8 TB in parity: 2*2 + 8*(7/6) = 13.3TB of protected user data. if you see less, than that is what SIS saved you.

I add these explanations as I guess you want to verify your single instancing over time to see if it changes and how much Centera saved you.

CU, Holger

I

ICI

9 Posts

0

April 12th, 2010 09:00

Hi, just a followup question on this topic. According to the chart even a Storage Strategy of Performance Full will result in M++ naming for large blobs. I would have thought that the GM naming would be used for this due to its significant performance increase. I see no difference between Full and Partial in this chart. I would have expected Performance Partial to use M++ on large blobs and Performance Full I to use GM.

Thanks

EMCDennis

124 Posts

0

April 12th, 2010 09:00

Performance partial and performance full are equal when writing small objects (by default, under 256 KB)

The difference comes into play when dealing with large objects over the performance threshold (256 KB by default)

View All

No Events found!