We're looking at starting a project where we will be reading and writing PDF files to a centera. Files will be up around the 100mb mark containing about 10,000 to 20,000 pages.
My question is how are these objects presented to the application - are they presented as a file bitstream, that you can manipulate as you would if the file was local, or are they presented as some other object type, that would then have to be copied to a host machine, then manipulated?
The object is stored as a single logical blob within the Centera (or transparently as multiple 100Mb sections for files over that size). There are Partial Read API calls which allow you to read fragments of a given size from a given offset, so I expect these would meet your needs.
"There are Partial Read API calls which allow you to read fragments of a given size from a given offset"
I guess this is the bit that I expected, and that worried me. Would this mean if I was a fragment it would just be a pure binary object and so something like a PDF library would not be able to read the fragment, and a PDF library would expect a properly formed PDF file.
Oh and ridiculously fast reply!!
Not sure what you are looking for / getting at here. You can treat it as a single binary object and read the bit you need. I know people have used zip files and their associated libraries on the objects with no problem, not sure why PDF would be different.
If you are using Java, there is a Partial File Input Stream that you can treat just like a regular Java stream to read the data back.
Sorry for the slowness. I've been looking into this a bit more.
I've found the FPTag.BlobRead and BlobRead partial (this is using the Dot Net SDK wrapper). Do you know where I can see some of the examples of how the zip files were accessed?
I've found I can read the blob into a memory stream, but it's not a complete PDF to the PDF library so it complains about it.
David - the zip file applications I talked about were written by partners / customers so I do not have access to their code.
As the Centera purely returns what was written / what you asked for, then I suggest this is an issue with the PDF libraries (or how you are using them). In the past I have certainly heard of people using PDF libraries to read individual pages from a single PDF document, but I am not sure of exactly how these work. It may be that you have to read an "index" structure from the document first and then read the required pages via it.
As this is a PDF library usage question (and thus outside the scope of the Centera SDK) I am afraid I cannot help out any further with it. I suggest you look for a PDF library forum as I am sure this is a common enough requirement.