Optimal Write/Object Size in Atmos

Question

Hello,

I have a few questions about what an optimal write/object size in Atmos is. i.e., if I am POSTing an object of size 500MB, I am faced with different ways that I can achieve it; I am listing all of those below and am wondering if you could comment on what the best practices are:

I could just POST 500MB of data at one time. Is this the best approach?
I could create an initial object, via POST, of 1 MB (say) and repeatedly use PUT's of 1MB to increase it to the total of 500MB.
1. 1. Is this a problem? Does it affect performance on the server (disk writes, buffering etc)
  2. Or does it actually improve performance?
  3. Does it make a difference if my initial POST size is 32MB and I do PUT's of 32MB?
Is there an optimal write size at the network level that is best suited for Atmos?
1. i.e., Does it matter what size TCP packets I write to the network? 64k? 128K? 1MB?
2. Is there an optimal socket buffer size that I should use on side before I connect to Atmos? What is recommended to ensure optimal performance?
3. Are there any specific network tweaks we can do on the client side to achieve optimal performance with Atmos?
I could split up the object into 5 sub-object of 100MB each and manage them at my client application layer to appear as one object.
1. Does this matter (in terms of performance) to Atmos?
2. What if I do 500 objects of 1MB?
3. Is there an optimal size of object that Atmos recommends that works best for it in terms of caching, reading, writing, replicating?

Thanks for your help.

Navneeth

David_Oswill · Accepted Answer

Navneeth,

This is a great question and one which needs a little background on the Atmos architecture. The Atmos onLine service comprises a massively scalable, parallel architecture which adds some level of overhead to transactions but which can also be leveraged to provide for concurrent write and read transactions. The service comprises a load-balanced front end with numerous back end access nodes which can be leveraged for multiple parallel transactions.

As you have pointed out there are essentially three ways to go about writing objects onto the system. 1) Writing a large object in its entirety 2) Writing a large object using multiple PUTs and 3) breaking up a large object into smaller objects for writing. Any of these methods are valid with the following caveats.

1) Writing a single large object to Atmos or Atmos onLine will effectively minimize the overhead on a per transaction basis but will not take advantage of the parallelism or concurrency of the system. You may also be exposed to network induced failures of the connection due to the longer transaction time of writing a single large object. A broken connection and subsequent retry can lead to what may look as very long latency.

2) Writing a single large object using multiple PUTs reduces the chances of network induced retries (such as the broken connection example above) but still operates as a single transaction and again does not leverage the concurrency of the system.

3) Breaking the large object up into multiple small objects will not only provide increased protection against network induced issues, but by writing multiple objects in parallel will allow accesses to be made to multiple access nodes in parallel, greatly speeding up the effective creation of the large object onto the Atmos system. By spreading these objects across the system, they can also be retrieved in parallel to increase the access time. The tradeoff is some added complexity in your application to create and then aggregate the object chunks.

For either 2 or 3, the recommended object size would be roughly 4MB so as to minimize the overhead associated with each object's creation.

As for network level parameters, these will depend largely on the local network as well as the internet characteristics and we would welcome any empirical data that you may be able to provide during your testing of the service.

Regards,
Dave

Navneeth1 · Answer

Hello David,

Thanks for a detailed reply; it's very helpful.

I understand the benefits of concurrency but our application itself is receiving data from another source as a stream of unknown size, so concurrent writing (as of today, at least) is not an option for us. We are going with approach #2 (one POST followed by multiple PUT's). We'll certainly explore reading concurrently from different parts of the object though: it would be a waste of all that powerful backend infrastructure if we didn't do that! :-)

The network question is a really important one for us; your point of dependency on the local network and the internet backbone are well taken; however, I was wondering if there are any recommendations from the Atmos server side on the amount of buffer to set, the maximum size of the TCP window that it supports and so on. We can certainly experiment but a ballpark figure to start us off would be good.

Noentheless, thank you for your detailed reponse; it helps us much.

Navneeth

Navneeth1 · Answer

Great. This is the kind of information we were looking for; thanks, Jason. Please do keep us posted on any such details that might help us tailor our application to Atmos.

Have a couple of follow-up questions on this:

1. Can I assume that the Atmos server accepts chunked encoding on uploads as well since it transfers in chunks on a download? Right now, since the size of our data is unknown at the beginning, we are doing one POST followed by multiple PUT's. With chunked encoding, we can just continue the one POST and end the transmission whenever our data ends. Do you have any thoughts on either approach?

2. David indicated that 4 MB is a good size to reduce overhead on writes; apart from network transfer considerations, are there any other sizes that affect a read?

thanks,

Navneeth

JasonCwik · Answer

Navneeth wrote:Great. This is the kind of information we were looking for; thanks, Jason. Please do keep us posted on any such details that might help us tailor our application to Atmos.Have a couple of follow-up questions on this:1. Can I assume that the Atmos server accepts chunked encoding on uploads as well since it transfers in chunks on a download? Right now, since the size of our data is unknown at the beginning, we are doing one POST followed by multiple PUT's. With chunked encoding, we can just continue the one POST and end the transmission whenever our data ends. Do you have any thoughts on either approach?2. David indicated that 4 MB is a good size to reduce overhead on writes; apart from network transfer considerations, are there any other sizes that affect a read?thanks,Navneeth Hi Navneeth, 1. I'm not sure; I haven't tested it out myself.  We use the same technique as you do though: POST followed by 4MB PUTs. 2. I'll ask around.  I still think that Atmos is using an internal 4MB buffer for web service requests so there might be extra work on the server side for requests > 4MB.

JasonCwik · Answer

Hi Navneeth,

One thing to check out when testing performance on read (at least for REST) will be request sizes <> 1MB. For requests over 1MB, Atmos switches into Transfer-Encoding: chunked to stream the response in 1MB chunks. For requests <= 1MB, it sends the whole response without encoding.

http://en.wikipedia.org/wiki/Chunked_transfer_encoding

Atmos

Optimal Write/Object Size in Atmos

Was this post helpful?