Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1302

January 5th, 2010 11:00

We have just started Archiving our Celerra onto the Centera

It looks like we have archivied about 10 TB of data. On the Centera it looks like there is 10 TB of used space on it. My thought was that the Centera was single instance by default. Is there something we need to turn on for it. I can't beleive that out of the 10 TB of data nothing is the same.

124 Posts

January 5th, 2010 11:00

Besides SIS, your 10 TB of used capacity also accounts for protection. The amount of overhead consumed by protecting the data depends on the protection scheme (mirroring = 100% overhead, Parity = 1/6th overhead)

Regarding SIS - below is from some old notes I have

There are 2 settings that determine if/when SIS is done.

On the Centera – Storage Strategy:

        Capacity – Allow SIS to occur (not entirely true)

        Performace full - Enables fast addressing for blobs (larger than 256KB by default) and all C-Clips.  Clips will be 53 characters

        Performance partial - Enables fast addressing for blobs only.  Clips will be 27 characters

In the SDK – collision avoidance.

        Clip collision avoidance – method to ensure clipIDs are unique (GM naming)

        Blob collision avoidance – method to ensure blobIDs are unique (GM naming)

There are 3 naming schemes currently used by Centera

M

        MD5 hash

        27 characters long

        Allows for SIS

        No timestamp in name

M++

        MD5 hash plus a 120 bit truncated SHA-256 hash of the data

        53 characters long

        Allows for SIS

        No timestamp in name

        Differs from SDK format.  Platform name starts with a G5

GM (w/ discriminator)

        MD5 hash plus a time stamp plus a header plus Unique Identifier (TS+Hdr+GUID)

        53 characters long

        Does NOT allow for SIS

        Differs from SDK format.  Platform name starts with a G4

There are two others but they are no longer used by current CentraStar code versions

        MG (53 characters)

        GM w/o discriminator (53 characters, Differs from SDK format.  Platform name starts with a G6)


CLIPS

Storage   Strategy

SDK   Collision Avoidance

ENABLED

SDK   Collision Avoidance

DISABLED

Capacity

GM (53 char)

M (27 char)

Performance Full

GM (53 char)

GM (53 char)

Performance Partial

GM (53 char)

M (27 char)

BLOBS (small – less than 256KB in size for each fragment.  That means the object must be smaller than 256KB for CPM or 1.5MB for CPP)

Storage   Strategy

SDK   Collision Avoidance

ENABLED

SDK   Collision Avoidance

DISABLED

Capacity

GM (53 char)

M++ (53 char)

Performance Full

GM (53 char)

GM (53 char)

Performance Partial

GM (53 char)

GM (53 char)

BLOBS (large – more than 256KB in size for each fragment.  That means the object must be larger than 256KB for CPM or 1.5MB for CPP)

Storage   Strategy

SDK   Collision Avoidance

ENABLED

SDK   Collision Avoidance

DISABLED

Capacity

GM (53 char)

M++ (53 char)

Performance Full

GM (53 char)

M++ (53 char)

Performance Partial

GM (53 char)

M++ (53 char)

Remember:

M and M++ allow for SIS - Shown in Green

GM does NOT allow for SIS

January 5th, 2010 23:00

Hi Martin

Dennis, thanks for the detailed explanation. It's great to see such a summary.

Martin, we manage 35 Centera systems and I hear your question quite often. From an admin point of view, there are two views to capacity on a system. The application point of view and the system point of view.

If you have archived 10TB of data, you would want your application point of view to show 10TB as you have archived that amount.

show pool capacity is the command in CenteraViewer CLI to show you the application point of view of archived data.

How much capacity your data that has been written to Centera really uses on Centera is shown with show capacity total and listed as protected user data.

If your Centera is in mirrored mode (show protection will tell you) you would expect protected user data to list 20TB as everything is stored twice. If you see less than 20TB that is what SIS saved.

If your Centera is in parity mode you would expect 11.6 TB (10*(7/6) for the 6+1 parity scheme) to be shown under protected user data. This is only true if 100% of your data is larger than the configured threshold for parity.

Assuming that 20% is small files smaller than then threshold of the default 256KB then it would be 2TB in mirrored and 8 TB in parity: 2*2 + 8*(7/6) = 13.3TB of protected user data. if you see less, than that is what SIS saved you.

I add these explanations as I guess you want to verify your single instancing over time to see if it changes and how much Centera saved you.

CU, Holger

9 Posts

April 12th, 2010 09:00

Hi, just a followup question on this topic.  According to the chart even a Storage Strategy of Performance Full will result in M++ naming for large blobs.  I would have thought that the GM naming would be used for this due to its significant performance increase.  I see no difference between Full and Partial in this chart. I would have expected Performance Partial to use M++ on large blobs and Performance Full I to use GM.

Thanks

124 Posts

April 12th, 2010 09:00

Performance partial and performance full are equal when writing small objects (by default, under 256 KB)

The difference comes into play when dealing with large objects over the performance threshold (256 KB by default)

No Events found!

Top