The probe limit time was reached error in multi-threaded application

Question

Hi

I am trying to migrate moderately sized (500kB) files to Centera using multiple threads and receiving following error:

com.filepool.fplibrary.FPLibraryException: The probe limit time was reached

at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)

Error code: -10212

Some of the writes succeed, some fail.

The import application is using code based on the Centera SDK sample (StoreContent), with addition of using threads to write multiple files at same time. The problem does not occur with 3 threads, but manifests itself when I use 10 concurrent threads (10 files imported at same time).

The options are:

FPPool.setGlobalOption(FPLibraryConstants.FP_OPTION_OPENSTRATEGY,

FPLibraryConstants.FP_NORMAL_OPEN);

thePool.setOption(FPLibraryConstants.FP_OPTION_BUFFERSIZE, 1024 * 5000);

I have turned on the Centera SDK logging and the only interesting lines I see there are following:

1359726523649 2013-02-01 13:48:43.649

[debug]

807158.258

[EXCEPTION]

In 'Connection.cpp' at line 994: Excepti

on

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=No trace available

1359726523649 2013-02-01 13:48:43.649

[debug]

807158.258

[RETRY] retry (0) Probe because Exception

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=FPDatagramSocket.receive (timeout=1000)

transid=ourServerName/1/PROBE

Application host system: AIX

Centera SDK: Centera_SDK_AIX-5_3

Has anyone encountered this issue?

Do you know the cause of the issue or how to fix it?

Thank you

Michal

mfh2 · Answer

Hello Michal -

One simple question: are you testing against a local Centera or one of the online clusters? Could it be a simple issue of needing to increase the UDP probe timeout to compensate for a long network delay?

Best Regards,

Mike Horgan

miso1 · Answer

I am testing against centera within our datacenter. Latency might be an issue. By increasing the UDP probe limit, do you mean setting FP_OPTION_PROBE_LIMIT?

In the meantime I have mitigated the problem by increasing buffer size to 5MB (used to be 5kB), increasing number of retries (FP_OPTION_RETRYCOUNT) and adding retries on the application level if I encounter this exception.

After increasing the buffer size the timeouts got more rare (maybe 1 per 100 files) and when I use retry on the application level, the document will get through. However it worries me, I did not remove the real cause of the issue.

Do you have any idea how to pinpoint/resolve the actual issue?

mfh2 · Answer

Yes, I would probably increase the probe limit to 10000 ms as a test, although this is a rarely-altered SDK parameter and shouldn't need changing unless you were doing something like talking over a very-high-latency WAN link.

The 5KB CDF buffer was far too small, so it's good that this value was increased. If you only have 1 blob per clip in your application then a value of 200KB for that buffer should be sufficient unless you are writing a tremendous amount odf app-specific metadata. Specifying a 5MB buffer will cause that amount to be allocated in a static buffer for each thread; not a big deal but kind of wasteful, esp. at high thread counts.

In terms of pinpointing, the only way I know of to track this down is to record an SDK log showing the problem and then decompose it into individual logs for each thread (using the process.thread value in the third column in your table). Look for commonalities in the thread traces that exhibit the problem. The SDK's UDP probe protocol is a very simple call/response and it should not encounter issues in network environments that are functioning correctly. If you see a lot of retry activity, even on threads that show ultimately successful transactions, that also points to possible network issues.

Best Regards,

MIke Horgan

Centera

The probe limit time was reached error in multi-threaded application

Was this post helpful?