Unsolved

This post is more than 5 years old

1 Rookie

 • 

9 Posts

2904

July 1st, 2010 01:00

FPTag_BlobWritePartial memory usage

Hi,

For a certain type of data, we have to store lots (millions) of very small files.

We use FPTag_BlobWritePartial() to containerize files in bigger blobs.

We noticed that the memory usage of the program goes up until either it gets an "out of memory error" or FPClip_Close() is called at which point memory seems to be released by the SDK.

To confirm this we have a test program that stores 9-bytes long files. With this test program, we see a memory usage increase of 200 MBytes for 100,000 files. This is about 2 KBytes per file.  This seems a lot.

Has anyone encountered this issue ?  And what possible workarounds are there ?

3.2p5 SDK and Centrastar 4 (tests performed on the online clusters)

Best regards,

Didier

409 Posts

July 1st, 2010 03:00

I would not recommend using FPTag_BlobWritePartial for a use case where you are going to write millions of small files into one blob.  The reason is that if you write say a million 2KB files then you will actually create a million objects on the centera as each 2KB file will become a 2KB object on the centera and not one large blob (use jcasscript to have a look at the CDF created with one of these clips and you'll see a million individual blob Content Addresses in the blob tag).  Also your ingest performance will be poor if you are writing small files as compared to writing large files.

The FPTag_BlobWritePartial() method is intended to allow applications that write very large files, to divide them up into say 8 segments and write each of these in parralel threads to increase the throughput of ingest over using just one thread to write the whole file.

Yes you can do what you are doing and if you were writing larger files (say >1MB or more) then it would be more of a fit than what you are doing (writing very small files).

I haven't done the arithmetic but I suspect that in writing millions of small files like this, the in memory CDF is becoming too large and that's why you see the large memeory consumption i.e. it's not a memory leak its just the CDF is getting huge!   Also some times you are actually hitting a limit and the SDK barfs and returns the out of memory error.

Instead I would recommend you do one of two things

1. Aggregate the small files into one big container file and then write this to the centera using FPTag_BlobWrite()

or

2. Use generic streams to write the individual small files to the Centera.  With generic streams the small files will be aggregated into one large blob on the Centera.

Option 1 is the simplest from a Centera SDK point of view and option 2 has to be coded with care as the SDK relies on your callback functions if the stream has to be rewound.  Option 1 will perform better as you are writing a large blob and not multiple small files.

1 Rookie

 • 

9 Posts

July 2nd, 2010 02:00

Hi,

Thanks Paul,

It seems that we didn't read the documentation closely enough and we assumed that a single object would be created.

We will use your suggestions.

Regards,

Didier

Top