Start a Conversation

Unsolved

This post is more than 5 years old

2289

November 25th, 2014 02:00

Algorithm used par EMC Centera to calculate CA

Hi;

In ordre to control integrety of our document archived in Centera, we need the algorithme used by EMc Centera to generate CA.

the algorithme can be GM or M++.

Also, we need to generate the CA of an archive before we send it to EMC Centera.

NB. We use Java as language programmation and the SDK 3.3.719.

Thanks for your Helps.

208 Posts

November 25th, 2014 03:00

The validation algorithm for blobs and CDFs is MD5 but this only forms a portion of a CA. I don't believe it is possible to correctly generate a full CA before committing the content.

A possible solution would be to run your own hash on the blobs of your content and store the result as metadata on the tags that reference those blobs. Upon retrieval you could read back your hash value and validate the blob as it is being read back from Centera.

Good Luck,

Mike Horgan

409 Posts

November 25th, 2014 03:00

Hi as Mike says the MD5 algorithm is the core of generating the Content Address although depending on the naming scheme we will use part of SHA and also add in timestamp and other details.  We do not however publish the format of the Content Address.  Also the Centera API does not expose directly the Content Address of the blob you have written.

Once you have written an object to centera and received the Content Address of the CDF back, your object is on the Centera.  The IO transaction has been authenticated from end-to-end and your object is stored in a protected manner.  If you want to check this you can just read it back.  By reading it back the Centera SDK will have rerun the particular algorithm used to store it and if the read is successful then you know the object is authenticated.  If there is something wrong with it you will get an error.

With this in mind could you explain in a bit more detail what you are wanting to achieve?

edit:typos

15 Posts

November 25th, 2014 10:00

Hi ;

thanks for your replies;

I want de to check integrity. thant mean it may be that my file was archived in Centera but it was corrupt.

thant to check that i want to precalculat the CA of my file and compare it to the CA generated by Centera.

if the 2 CA was differents ==> the file stored in Centera is corrupt (a problem occur in storage for example...)

in ordre to complete that , i want the manner that Centera calculate the CA

208 Posts

November 25th, 2014 11:00

Since the CA that is returned as a result of your storage operation with Centera represents the metadata file (the CDF, not the blob data) then there isn't really an efficient way to accomplish your goal.

If your data is so critical that you need to protect against one-in-a-billion hardware hiccups then you could:

  1. Calculate the hash of your file
  2. Create a clip/blob on Centera containing your data, returning the clip CA
  3. Close and re-open the clip using the CA
  4. Read back your blob, calculating a hash as it streams back
  5. Compare the 'read hash' against your value from step 1

If you do this, you can be certain that when you retrieve the data in the future Centera's SDK will ensure the data you read is correct.  My company does migrations of very large amounts of Centera data (PBs, many billions of clips over the last 5 years) and the read validation mechanism in the latest SDKs has not failed us in the field.  If you want to be double-extra sure about authenticity upon retrieval then you can store the 'read hash' in the blob's tag metadata and use it to validate the contents upon read.

Good Luck,

Mike Horgan

Interlock Technology Inc

15 Posts

November 26th, 2014 01:00

Hi;

thanks for your reply.

I will act as you suggest.

question pleaz : you mean that md5

md5 value here in CDF representes the hash of CDF and not the blob data?

409 Posts

November 26th, 2014 02:00

No that is the CA of the blob.  In order to get it though you have to do a bit of work.  There is no API method call which says "return me the CA of the blob in this tag".  You need to rawread the CDF and then traverse the CDF XML structure to find the CA for the blob you want.

208 Posts

November 26th, 2014 03:00

No, what I actually meant was that the CA returned by the ClipWrite() call is the CDF CA. The raw CDF text you are viewinng does have the address for each blob as you point out.  Notice the 'x' in the 13th character; CDF CAs (clips) have an 'e' in this position.

Best Regards,

Mike Horgan

15 Posts

November 28th, 2014 01:00

Hi;

thanks for all your replies.

Because there is no way to know the algorithm used by Centera, i will act as Mike suggest :

  1. Calculate the hash of your file
  2. Create a clip/blob on Centera containing your data, returning the clip CA
  3. Close and re-open the clip using the CA
  4. Read back your blob, calculating a hash as it streams back
  5. Compare the 'read hash' against your value from step 1

Best Regards,

208 Posts

November 28th, 2014 06:00

Good luck and please let us know how it turns out!

Best Regards,

MIke Horgan

15 Posts

January 16th, 2015 02:00

Hi;

The algorithm work fine.

Thanks for your helps.

208 Posts

January 16th, 2015 05:00

Excellent, glad to hear it.  Thanks for reporting back!

Best Regards,

Mike Horgan

No Events found!

Top