This is a great question and one which needs a little background on the Atmos architecture. The Atmos onLine service comprises a massively scalable, parallel architecture which adds some level of overhead to transactions but which can also be leveraged to provide for concurrent write and read transactions. The service comprises a load-balanced front end with numerous back end access nodes which can be leveraged for multiple parallel transactions.
As you have pointed out there are essentially three ways to go about writing objects onto the system. 1) Writing a large object in its entirety 2) Writing a large object using multiple PUTs and 3) breaking up a large object into smaller objects for writing. Any of these methods are valid with the following caveats.
1) Writing a single large object to Atmos or Atmos onLine will effectively minimize the overhead on a per transaction basis but will not take advantage of the parallelism or concurrency of the system. You may also be exposed to network induced failures of the connection due to the longer transaction time of writing a single large object. A broken connection and subsequent retry can lead to what may look as very long latency.
2) Writing a single large object using multiple PUTs reduces the chances of network induced retries (such as the broken connection example above) but still operates as a single transaction and again does not leverage the concurrency of the system.
3) Breaking the large object up into multiple small objects will not only provide increased protection against network induced issues, but by writing multiple objects in parallel will allow accesses to be made to multiple access nodes in parallel, greatly speeding up the effective creation of the large object onto the Atmos system. By spreading these objects across the system, they can also be retrieved in parallel to increase the access time. The tradeoff is some added complexity in your application to create and then aggregate the object chunks.
For either 2 or 3, the recommended object size would be roughly 4MB so as to minimize the overhead associated with each object's creation.
As for network level parameters, these will depend largely on the local network as well as the internet characteristics and we would welcome any empirical data that you may be able to provide during your testing of the service.
I understand the benefits of concurrency but our application itself is receiving data from another source as a stream of unknown size, so concurrent writing (as of today, at least) is not an option for us. We are going with approach #2 (one POST followed by multiple PUT's). We'll certainly explore reading concurrently from different parts of the object though: it would be a waste of all that powerful backend infrastructure if we didn't do that! :-)
The network question is a really important one for us; your point of dependency on the local network and the internet backbone are well taken; however, I was wondering if there are any recommendations from the Atmos server side on the amount of buffer to set, the maximum size of the TCP window that it supports and so on. We can certainly experiment but a ballpark figure to start us off would be good.
Noentheless, thank you for your detailed reponse; it helps us much.
Great. This is the kind of information we were looking for; thanks, Jason. Please do keep us posted on any such details that might help us tailor our application to Atmos.
Have a couple of follow-up questions on this:
1. Can I assume that the Atmos server accepts chunked encoding on uploads as well since it transfers in chunks on a download? Right now, since the size of our data is unknown at the beginning, we are doing one POST followed by multiple PUT's. With chunked encoding, we can just continue the one POST and end the transmission whenever our data ends. Do you have any thoughts on either approach?
2. David indicated that 4 MB is a good size to reduce overhead on writes; apart from network transfer considerations, are there any other sizes that affect a read?
Great. This is the kind of information we were looking for; thanks, Jason. Please do keep us posted on any such details that might help us tailor our application to Atmos.
Have a couple of follow-up questions on this:
1. Can I assume that the Atmos server accepts chunked encoding on uploads as well since it transfers in chunks on a download? Right now, since the size of our data is unknown at the beginning, we are doing one POST followed by multiple PUT's. With chunked encoding, we can just continue the one POST and end the transmission whenever our data ends. Do you have any thoughts on either approach?
2. David indicated that 4 MB is a good size to reduce overhead on writes; apart from network transfer considerations, are there any other sizes that affect a read?
thanks,
Navneeth
Hi Navneeth,
1. I'm not sure; I haven't tested it out myself. We use the same technique as you do though: POST followed by 4MB PUTs.
2. I'll ask around. I still think that Atmos is using an internal 4MB buffer for web service requests so there might be extra work on the server side for requests > 4MB.
One thing to check out when testing performance on read (at least for REST) will be request sizes <> 1MB. For requests over 1MB, Atmos switches into Transfer-Encoding: chunked to stream the response in 1MB chunks. For requests <= 1MB, it sends the whole response without encoding.
David_Oswill
16 Posts
0
May 29th, 2009 11:00
Navneeth,
This is a great question and one which needs a little background on the Atmos architecture. The Atmos onLine service comprises a massively scalable, parallel architecture which adds some level of overhead to transactions but which can also be leveraged to provide for concurrent write and read transactions. The service comprises a load-balanced front end with numerous back end access nodes which can be leveraged for multiple parallel transactions.
As you have pointed out there are essentially three ways to go about writing objects onto the system. 1) Writing a large object in its entirety 2) Writing a large object using multiple PUTs and 3) breaking up a large object into smaller objects for writing. Any of these methods are valid with the following caveats.
1) Writing a single large object to Atmos or Atmos onLine will effectively minimize the overhead on a per transaction basis but will not take advantage of the parallelism or concurrency of the system. You may also be exposed to network induced failures of the connection due to the longer transaction time of writing a single large object. A broken connection and subsequent retry can lead to what may look as very long latency.
2) Writing a single large object using multiple PUTs reduces the chances of network induced retries (such as the broken connection example above) but still operates as a single transaction and again does not leverage the concurrency of the system.
3) Breaking the large object up into multiple small objects will not only provide increased protection against network induced issues, but by writing multiple objects in parallel will allow accesses to be made to multiple access nodes in parallel, greatly speeding up the effective creation of the large object onto the Atmos system. By spreading these objects across the system, they can also be retrieved in parallel to increase the access time. The tradeoff is some added complexity in your application to create and then aggregate the object chunks.
For either 2 or 3, the recommended object size would be roughly 4MB so as to minimize the overhead associated with each object's creation.
As for network level parameters, these will depend largely on the local network as well as the internet characteristics and we would welcome any empirical data that you may be able to provide during your testing of the service.
Regards,
Dave
Navneeth1
29 Posts
0
June 3rd, 2009 12:00
Hello David,
Thanks for a detailed reply; it's very helpful.
I understand the benefits of concurrency but our application itself is receiving data from another source as a stream of unknown size, so concurrent writing (as of today, at least) is not an option for us. We are going with approach #2 (one POST followed by multiple PUT's). We'll certainly explore reading concurrently from different parts of the object though: it would be a waste of all that powerful backend infrastructure if we didn't do that! :-)
The network question is a really important one for us; your point of dependency on the local network and the internet backbone are well taken; however, I was wondering if there are any recommendations from the Atmos server side on the amount of buffer to set, the maximum size of the TCP window that it supports and so on. We can certainly experiment but a ballpark figure to start us off would be good.
Noentheless, thank you for your detailed reponse; it helps us much.
Navneeth
Navneeth1
29 Posts
0
June 7th, 2009 10:00
Great. This is the kind of information we were looking for; thanks, Jason. Please do keep us posted on any such details that might help us tailor our application to Atmos.
Have a couple of follow-up questions on this:
1. Can I assume that the Atmos server accepts chunked encoding on uploads as well since it transfers in chunks on a download? Right now, since the size of our data is unknown at the beginning, we are doing one POST followed by multiple PUT's. With chunked encoding, we can just continue the one POST and end the transmission whenever our data ends. Do you have any thoughts on either approach?
2. David indicated that 4 MB is a good size to reduce overhead on writes; apart from network transfer considerations, are there any other sizes that affect a read?
thanks,
Navneeth
JasonCwik
281 Posts
0
June 7th, 2009 10:00
Hi Navneeth,
1. I'm not sure; I haven't tested it out myself. We use the same technique as you do though: POST followed by 4MB PUTs.
2. I'll ask around. I still think that Atmos is using an internal 4MB buffer for web service requests so there might be extra work on the server side for requests > 4MB.
JasonCwik
281 Posts
1
June 7th, 2009 10:00
Hi Navneeth,
One thing to check out when testing performance on read (at least for REST) will be request sizes <> 1MB. For requests over 1MB, Atmos switches into Transfer-Encoding: chunked to stream the response in 1MB chunks. For requests <= 1MB, it sends the whole response without encoding.
http://en.wikipedia.org/wiki/Chunked_transfer_encoding