Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

5460

October 13th, 2010 10:00

ClipID Generation

Hello,

     Just had a questions about clipIDs.

Are clipIDs generated randomly/sequentially or is there an algorithm that generates based on the file itself?

Need to know if two inserts of the same file at different times would generate the same ClipID. Because they are all the same length I would assume that part of it would be a hash of the file.

Thanks,

Andy

417 Posts

October 20th, 2010 07:00

A clip is essentailly just metadata. In this metadata are tags that contain references to blobs (pieces of content).

So if two clips reference the same blob(s) and one clip gets deleted, the content remains protected on the cluster as it is still referenced by the other clip. When that other clip is deleted then the blob(s) will be unreferenced and eligible for Garbage Collection.

A clipID cannot be duplicated because it is (mostly) an MD5 (or hybrid MD5 / SHA-256) hash of the metadata. Much of this metadata is time based and thus unique to a point in time when the clip is created.

Have you verified that the time on the cluster is correct?

409 Posts

October 13th, 2010 10:00

Yes something mustve changed to the naming scheme being used.  the file length should be 57 if its not 27

409 Posts

October 13th, 2010 10:00

The Content Addresses are generated by running an algorythm on the binary content of the CDF.  The algorythm used is dependant on the naming scheme used but typically is a combination of MD5 and SHah 256.  The other current naming scheme uses MD5 and timestamp cluster id info

11 Posts

October 13th, 2010 10:00

Cool, btw was there a recent change in this, I ask because the clipIDs i had been receiving a few months ago from the Public test devices had i think only 26 characters now they are near 40. Was there a recent upgrade released that modified this?

11 Posts

October 19th, 2010 14:00

Hello EMC Friends, So I was able to get our customer to contact EMC support and it turns out that the ClipID that we had been unable to find in Centera had actually been deleted, the kicker here is that EMC states that it had been delete 2 months before our system said that it loaded it.

If I put the same file on Centera twice will I receive the same ClipID?

11 Posts

October 19th, 2010 14:00

How can I verify what naming scheme is being used?

417 Posts

October 20th, 2010 06:00

You can check this on the cluster using the CLI - the "show features" command will tell you if the cluster is set up to use Capacity or Performance storage strategy.

However, this should not be relevant to your application.

417 Posts

October 20th, 2010 06:00

What do you mean by "file"? Individual pieces of content within a clip will be single instanced - in which case the answer is "yes" (the same blob content address results when the same content is stored twice). Clip addresses will always be different because the metadata contains timestamp and other similar information that will not be the same for repeated operations.

So underlying blob content addresses would be the same, but the c-clip content address would always be different.

11 Posts

October 20th, 2010 07:00

Ahh yes the clock on the device itself .. I have not yet asked them as I just received this info last night... We will see what they say.

Thanks for your help on this!

11 Posts

October 20th, 2010 07:00

So does that mean if I put blob content in the Centera twice, receive 2 different clip's then tell one of them to delete, the other clip will be nulled out as well, or is the clip just a reference to the blob and the blob will not be deleted until all clips have been marked for deletion?

I'm sorry I don't understand Centera terminology very well, we just write to the disk as a one off.

We have an FPClip with an FPTag that has an attribute called filename, we generate an FPStream and use the FPTag's BlobWrite function to write the FPStream.

Then we Write the Clip and get the ClipID back.

The problem we are seeing is that EMC said the file was deleted almost 3 months before our system said it wrote it. That is why I am wondering if there is anyway the ClipID could have been reused? Paul had mentioned the following:

"The Content Addresses are generated by running an algorythm on the binary content of the CDF.  The algorythm used is dependant on the naming scheme used but typically is a combination of MD5 and SHah 256.  The other current naming scheme uses MD5 and timestamp cluster id info"

I feel that in the first scheme there is potential for a ClipID to be duplicated, but that is why I am asking

Thanks,

No Events found!

Top