Start a Conversation

Unsolved

This post is more than 5 years old

N

7860

October 29th, 2009 07:00

Centera performance

I wrote simple java program to test centera performance.

It takes 1.4 second to write 71.680 bytes big PDF file to centera.

Is this performance normal?

C:\Temp>java -jar CenteraTest.jar
About to write blob
Blob written
About to write clip
Wrote clip
Got clip id: E608EO4F0EL0Ve7Q3MH99BI5KHD
Date Difference
Time in milliseconds: 1469 milliseconds.
Time in seconds: 1 seconds.
Time in minutes: 0 minutes.
Time in hours: 0 hours.
Time in days: 0 days.

1 Attachment

409 Posts

October 29th, 2009 07:00

Possibly, it depends

October 30th, 2009 00:00

Hi

This is very well possible. To open the pool connection already takes a substantial time, especially if the system is replicated (as the sdk would login to both systems) before any data IO is performed.

Bear in mind that a production application should use multiple threads and a shared pool object. The overhead to open the pool will only occur once and multiple objects will be sent in parallel.

Holger

15 Posts

November 4th, 2009 03:00

I am asking this regarding to original issue described in: https://community.emc.com/thread/5211?tstart=0

I created those two threads to get an estimation of write performance to centera (through documentum centera interface).

We tried to isolate the performance regarding to documentum/centera interface or centera box and for that reason I wrote this simple java sample.

We are trying to get optimal (best performance) parameters for documentum/centera interface. Multiple writing to centera (thread count parameter for migrate_content job) is significaly influencing performance.

Still untill now I didn't get performance numbers from community.

We are currently seeing 3,82 tipical documents of 60kb content per second.

417 Posts

November 4th, 2009 03:00

A cluster will typically make available approximately 20 useable threads per Access Node. Multi-threading (or multiple processes) is essential to make the optimum use of the Centera architecture.

Assuming a cluster with 4 Access Nodes in a dedicated ingest environment then you should be using 80 threads (4 * 20) for writing your content. This should be decreased if the cluster is shared amongst multiple applications or if you require to concurrently retrieve content while ingesting.

60KB files are most definitely small from a Centera perspective. You should enable the use of Embedded Blobs in such cases - this is described in the API Reference and Programmer's Guide. If the application is not "homegrown" (Documentum in your case) then it can still be enabled via environment variables if the application does not already make use of it.

I see that your other query has been raised in the Documentum forum so hopefully you will receive more application specific tuning details via that avenue.

417 Posts

November 4th, 2009 03:00

Centera is an IP device and as such offers "internet" levels of performance. Typically this means sub-second performance for tranasctions. Obviously the size of the content has a huge bearing on this, but for small files (<5MB) this should be typical.

The "locality" and load of the cluster will also obviously play a big part, especially if you are using the public EMC clusters.

409 Posts

November 4th, 2009 04:00

Getting the best performance from a Centera is dependant on a number of things.

Network

The Application Server and the Centera should be connected via a LAN and not a WAN, preferably GigaBit ethernet

Object Size

This will pay a large part in throughput.  The larger the object the better through put you will achieve due to the overhead writing content to Centera.  To illustrate, a single thread writing a 10KB file will achieve 4 IO/sec giving you a bandwidth of 40KB/s, whereas writing a 100MByte file will give you 15-20Mbytes/sec (these are rough figures and do require the application to working optimally).  If you are mainly ingesting small files <100KB you can use a feature called embedded blobs to improve IO.

Number of Threads

Another major factor in overall throughput.  Having many threads of IO will allow you to get the most throughput from a Centera.  Our best practise is that you can have up to 20 IO threads per Access Node, shared between all applications using the Centera. Although most applications bottle neck elsewhere before they can achieve this.

So back to the original question, the result you are seeing with your test is probably correct but is dependent on a number of factors that you can see these answers.

If you feel that you are not achieving the performance levels you expected from the Centera I would recommend you have a discussion with  your Documentum account team.

417 Posts

November 4th, 2009 04:00

Using Embedded Blobs will not improve necessarily improve performance, it relates minimizing the of object count.

You should speak with your Documentum contact before enabling embedded blobs but FP_OPTION_EMBEDDED_DATA_THRESHOLD is the environment variable in question. It appears to be zero (i.e. not used) in the log file you attached.

15 Posts

November 4th, 2009 04:00

The only thing I could think of now is, that authors of centera/documentum interface didn't used "embeded blob" feature.

I attach csec plugin log file (first part where initialization is happened), from witch you could maybe see if this option is turned on.

What is the variable to turn on "embeded blob" feature?

1 Attachment

15 Posts

November 4th, 2009 06:00

Network

The Application Server and the Centera should be connected via a LAN and not a WAN, preferably GigaBit ethernet

I can confirm we have centera connected to documentum via LAN.

Object Size

This will pay a large part in throughput.  The larger the object the better through put you will achieve due to the overhead writing content to Centera.  To illustrate, a single thread writing a 10KB file will achieve 4 IO/sec giving you a bandwidth of 40KB/s, whereas writing a 100MByte file will give you 15-20Mbytes/sec (these are rough figures and do require the application to working optimally).  If you are mainly ingesting small files <100KB you can use a feature called embedded blobs to improve IO.

Will check if "embeded blob" feature is used within "csec plugin" of centera/documentum interface.

Number of Threads

Another major factor in overall throughput.  Having many threads of IO will allow you to get the most throughput from a Centera.  Our best practise is that you can have up to 20 IO threads per Access Node, shared between all applications using the Centera. Although most applications bottle neck elsewhere before they can achieve this.

Currently we are using 10 threads. We found out with experimenting that this one significantly effects performance.

15 Posts

May 20th, 2010 22:00

Multithreading is key here.

No Events found!

Top