This post is more than 5 years old
20 Posts
0
2703
May 15th, 2010 06:00
How to obtain BLOB id programmatically (using Centera SDK)?
What I need is the small utility that will display the BLOB IDs
(content addresses) for all BLOBS in a given Clip ID. The utility will help
during the QA testing phase to verify the data deduplication for the documents
larger than the embedding threshold we are using in our application.
What I am currently doing is writing the content of the ClipRAW open stream to
the file (using JCASScripty) and than comparing the eclipblob md5 xml tag from
the RAW files for different document clips.
Example of the eclipblob entry in the clip RAW file:
As far as I can see, the Centera SDK does not provide a direct way to get the BLOP content addresses.
Is there any better way of getting this information programmatically short of parsing the clip RAW file/stream?
0 events found


gstuartemc
2 Intern
•
417 Posts
0
May 15th, 2010 07:00
What you are doing is correct. This is the only way of getting the underlting content blob ids.
Why are you trying to verify single instancing??? Centera has been in the field for over 7 years and as this is one of the primary value props I think we would have heard know if it didn't work!!
Joking aside, it most definitely does work as I am sure you have seen in your tests. Only in environments where it is disabled (by the cluster settings or specifically by the application using options on the BlobWrite command) would you see it "failing".
andrej3
20 Posts
0
May 15th, 2010 20:00
Thanks Graham,
I never really doubted Centera claim and so far I haven’t found one problem with deduplication (but I’ll keep trying).