208 Posts

February 4th, 2013 06:00

Hello Michal -

One simple question: are you testing against a local Centera or one of the online clusters?  Could it be a simple issue of needing to increase the UDP probe timeout to compensate for a long network delay?

Best Regards,

Mike Horgan

2 Posts

February 4th, 2013 06:00

I am testing against centera within our datacenter. Latency might be an issue. By increasing the UDP probe limit, do you mean setting FP_OPTION_PROBE_LIMIT?

In the meantime I have mitigated the problem by increasing buffer size to 5MB (used to be 5kB), increasing number of retries (FP_OPTION_RETRYCOUNT) and adding retries on the application level if I encounter this exception.

After increasing the buffer size the timeouts got more rare (maybe 1 per 100 files) and when I use retry on the application level, the document will get through. However it worries me, I did not remove the real cause of the issue.

Do you have any idea how to pinpoint/resolve the actual issue?

208 Posts

February 4th, 2013 08:00

Yes, I would probably increase the probe limit to 10000 ms as a test, although this is a rarely-altered SDK parameter and shouldn't need changing unless you were doing something like talking over a very-high-latency WAN link.

The 5KB CDF buffer was far too small, so it's good that this value was increased. If you only have 1 blob per clip in your application then a value of 200KB for that buffer should be sufficient unless you are writing a tremendous amount odf app-specific metadata. Specifying a 5MB buffer will cause that amount to be allocated in a static buffer for each thread; not a big deal but kind of wasteful, esp. at high thread counts.

In terms of pinpointing, the only way I know of  to track this down is to record an SDK log showing the problem and then decompose it into individual logs for each thread (using the process.thread value in the third column in your table). Look for commonalities in the thread traces that exhibit the problem.  The SDK's UDP probe protocol is a very simple call/response and it should not encounter issues in network environments that are functioning correctly.  If you see a lot of retry activity, even on threads that show ultimately successful transactions, that also points to possible network issues.

Best Regards,

MIke Horgan

No Events found!

Top