Start a Conversation

Unsolved

This post is more than 5 years old

4363

January 2nd, 2014 06:00

EsuApi's ReadObject() in Atmos storage cloud .NET API

May I ask what would be the most efficient way of determining how many bytes ReadObject() had read into the buffer?

I observed when passing non-null buffer to ReadObject() that the buffer length remains unchanged regardless of the amount of data being read. If I pass null buffer to ReadObject() the returned buffer will be shorter if ReadObject() read less.

Thanks.

"

byte[] ReadObject(

    Identifier id,

    Extent extent,

    byte[] buffer

)

Parameters

id (Identifier)

the identifier of the object whose content to read.

extent (Extent)

the portion of the object data to read. Optional. If null, the entire object will be read.

buffer (array< Byte >[]()[])

the buffer to use to read the extent. Must be large enough to read the response or an error will be thrown. If null, a buffer will be allocated to hold the response data. If you pass a buffer that is larger than the extent, only extent.getSize() bytes will be valid.

"

May I also ask what is the best way of determining that one has reached the end of Atmos storage cloud object on reading?

Doing it like this

"

catch (EsuException e)

{

    if (e.Message.ToLowerInvariant().Contains("specified range cannot be satisfied"))

    {

        // handling code

        return true;

    }

    else

    {

        Console.Error.WriteLine("{0}", e.Message);

        return false;

    }

}

"

looks a bit unclean to me as "specified range cannot be satisfied" phrase may change in the future, and I wonder is there a way of catching '1004' error number instead?

Thanks.

13 Posts

January 2nd, 2014 07:00

P.S.

Please note that in meantime I have found the answer on the second question.

catch (EsuException e)

{

    if (e.Code == 1004)

    {

        // handling code

        return true;

    }

    else

    {

        Console.Error.WriteLine("{0}", e.Message);

        return false;

    }

}

110 Posts

January 6th, 2014 10:00

If you are not providing an Extent (i.e. you are reading the entire object) and you wish to use your own buffer, you should use the ReadObjectStream method instead:

ReadObjectStreamResponse response = esu.ReadObjectStream(oid, null);

int count = response.Content.Read(buffer, 0, buffer.Length);

response.close();

Using this method, you can also get the object length from response.Length before streaming anything.

13 Posts

January 7th, 2014 03:00

Thanks for your reply Chris.

If cloud is an instance of EsuRestApi type can I have something like,

ObjectPath path = new ObjectPath("/rest/namespace/test/" + fileName);

ReadObjectStreamResponse response = cloud.ReadObjectStream(path, null);

int count = response.Content.Read(data, 0, data.Length);

response.close();

if (count < data.Length)

{

    byte[] temp = new byte[count];

    Contract.Assert(temp.Length == count);

    Array.Copy(data, 0, temp, 0, temp.Length);

    data = temp;

}

and is it cheaper than constructing a buffer in ReadObject and copying d into data in

ObjectPath path = new ObjectPath("/rest/namespace/test/" + fileName);

Extent ext = new Extent(off, data.Length);

byte[] d = cloud.ReadObject(path, ext, null); // read

Contract.Assert(d.Length <= ext.Size);

if (d.Length < data.Length)

{

    byte[] temp = new byte[d.Length];

    Contract.Assert(temp.Length == d.Length);

    data = temp;

}

Array.Copy(d, 0, data, 0, data.Length);

May I ask if there is any chance of having a new version of ReadObject, say ReadObjectExt, returning the number of bytes read so that one can have something like

ObjectPath path = new ObjectPath("/rest/namespace/test/" + fileName);

Extent ext = new Extent(off, data.Length);

int count = cloud.ReadObjectExt(path, ext, data); // read

Contract.Assert(count <= ext.Size);

if (count < data.Length)

{

    byte[] temp = new byte[count];

    Contract.Assert(temp.Length == count);

    Array.Copy(data, 0, temp, 0, temp.Length);

    data = temp;

}

May I also take this opportunity and ask if there is any chance of having a new version of ReadObject where one can specify more than one range. This may be implemented so that one can pass say an array of objects of Extent type. ReadObject should issue only one REST command specifying multiple ranges in it. There can be a limit to the maximum number of ranges one can specify.

These changes to the .NET API would be useful to us. We would not wish to make the changes ourselves, firstly because you should be better qualified to do so, secondly we would wish to be able to use new versions of the .NET API without a need to re-implement any custom changes, and that is why we would wish to use the officially released .NET API only.

Thanks.

13 Posts

January 7th, 2014 10:00

I can't imagine why you would ask for a range if you don't already know how large the object is.

Normally we shall know the Atmos object sizes and often may need to restore only bits of them. You may look at our data content stored in Atmos objects a sort of Zip archives with to us known structure.


public byte[] ReadObjectMultiRange(EsuApi esu, Identifier id, params Extent[] extents) {

    List data = new List ();

    foreach (Extent extent in extents) {

        byte[] buffer = esu.ReadObject(id, extent, null);

        data.AddRange(buffer);

    }

    return data.ToArray();

}


The idea is to avoid calling ReadObject multiple times (which as far as I understand each time it is called issues a REST request) and to be able to get multiple object rages in one go (issuing a single REST request).

If you're reading the entire object and you don't know how large it is, then you will have to deal with that (i.e. maybe call GetSystemMetadata to get the size before you stream).

I prefer a paradigm when one knows if he or she has reached the end of Atmos object (or data container) on read (rather than one where he or she needs to acquire the object size before the transfer and handle the ranges by himself or herself).

I used the following code to detect the end of Atmos object on read (my assumption was that my ranges should be correct and that ReadObject will throw 1004 exception only if I intend to read past the end of the object of to me unknown size).

catch (EsuException e)

{

    if (e.Code == 1004)

    {

        // handling code

        return true;

    }

    else

    {

        Console.Error.WriteLine("{0}", e.Message);

        return false;

    }

}

Thanks.

110 Posts

January 7th, 2014 10:00

I'm not sure what you're trying to accomplish, so it's difficult to give a straight answer.

In general, if the data you're reading will fit into memory, the simplest solution is to call ReadObject(path, extent, null).  You will always get byte[extent.Length] or an exception will be thrown.  If you want the entire object, extent should be null and you will get byte[objectSize].  If your goal is to have a sized byte array with all of the data, use this method; there is no need to initialize any other buffers.

If the data size is large, you can call ReadObjectStream and handle the streaming in your code.  This method will also give you a ReadObjectStreamResponse, which reveals the size of the data in the Length property.

May I ask if there is any chance of having a new version of ReadObject, say ReadObjectExt, returning the number of bytes read so that one can have something like

Given the above, there doesn't seem to be a need for this.

May I also take this opportunity and ask if there is any chance of having a new version of ReadObject where one can specify more than one range. This may be implemented so that one can pass say an array of objects of Extent type. ReadObject should issue only one REST command specifying multiple ranges in it. There can be a limit to the maximum number of ranges one can specify.

This is on our list.  It is already available in the Java SDK.  You can use a method similar to the following to accomplish this:

public byte[] ReadObjectMultiRange(EsuApi esu, Identifier id, params Extent[] extents) {

    List data = new List ();

    foreach (Extent extent in extents) {

        byte[] buffer = esu.ReadObject(id, extent, null);

        data.AddRange(buffer);

    }

    return data.ToArray();

}

You would call this like so:

byte[] data = ReadObjectMultiRange(cloud, path, new Extent(100, 5), new Extent(256, 4), new Extent(123, 10));

110 Posts

January 7th, 2014 12:00

Normally we shall know the Atmos object sizes and often may need to restore only bits of them. You may look at our data content stored in Atmos objects a sort of Zip archives with to us known structure.


In this case, you must already store an index of your archives, which means you know the correct ranges.  Therefore, you should never read past the end of an object and if you do, you have much bigger problems.


The idea is to avoid calling ReadObject multiple times (which as far as I understand each time it is called issues a REST request) and to be able to get multiple object rages in one go (issuing a single REST request).


Yup, this is on our list, but you can always implement it yourself through extension.  That way, when it is included in a future update, you can gracefully refactor your code or simply continue to use your implementation.


I prefer a paradigm when one knows if he or she has reached the end of Atmos object (or data container) on read (rather than one where he or she needs to acquire the object size before the transfer and handle the ranges by himself or herself).


This is provided via ReadObjectStream().  If you request a range, then you must know ahead of time that the range is valid; otherwise, how do you know what you're reading?  Do you keep indexes that say "from byte 172 to the end of the object"?  I'm confused.

13 Posts

January 7th, 2014 16:00

OK, thanks Chris.

Do you keep indexes that say "from byte 172 to the end of the object"?  I'm confused.

Yes I may need to support such a case (but I can handle it as described earlier).

110 Posts

January 9th, 2014 20:00

FYI, we just released v2.1.4.31 of the Atmos .NET SDK.  This includes a new method to retrieve multiple ranges with a single call.  Bellow is some sample code to illustrate this new method.

ObjectPath path = new ObjectPath("/my/archive.zip");

// simply pass in an identifier and some (valid) extents

// note: you cannot call this method with a checksum and you cannot stream it

MultipartEntity entity = esu.ReadObjectExtents(path, new Extent(10, 5), new Extent(128,32), new Extent(256, 32));

// MultipartEntity extends List

byte[] part1Data = entity[0].Data;

// and has a convenience method to aggregate all of the data into one array

byte[] aggregateData = entity.AggregateBytes();

As always, your ranges must be valid or you will receive an error and no data.  Let me know if you have any questions.

13 Posts

June 11th, 2014 07:00

Thanks a lot for your (prompt) help.

281 Posts

June 11th, 2014 07:00

Yes, there is a limit of 8kB for all of the HTTP headers in a request.  So, you can probably squeeze 50 or so in one request.  Running your requests through a debugging proxy like fiddler can be helpful for seeing the requests.  If you exceed the request limit, you will see an HTTP 400 from Atmos.

13 Posts

June 11th, 2014 07:00

Thank you very much for implementing EsuRestApi C# .NET class ReadObjectExtents public function, which, as it would seem, is using REST multiple-range headers for efficiency

public MultipartEntity ReadObjectExtents(Identifier id, params Extent[] extents)

May I ask if extents array of Extent[] type in EsuRestApi's ReadObjectExtents public function has a maximum size, say due to the limits in REST header sizes, etc., and if it has what would be a recommended maximum value?

Thanks

13 Posts

June 11th, 2014 08:00

P.S.

Am I right in thinking that a range header is of the form below and that its size has to be within the 8KB limit, i.e., that only this one line is the header?

range: Bytes=4-8,41-44

If so, knowing our typical Atmos Cloud storage object sizes, I may assess a maximum number of extents to be passed to EsuRestApi's ReadObjectExtents public function to be on the safe side.

Thanks.

P.P.S.

I haven't mentioned it, but I am, of course, taking care of the maximum size of the data to be read in the data buffers by EsuRestApi's ReadObjectExtents public function, and that can be easily done by knowing the extent sizes.

281 Posts

June 11th, 2014 13:00

Yes, that is the header format, however the 8kB is for all of the headers, so you'll have to account a few hundred bytes for those too (e.g. Accept, Date, User-Agent, x-emc-uid, x-emc-signature, etc).

13 Posts

June 13th, 2014 03:00

Thank you very much.

Just for the record, assuming that the Atmos objects are less than 1 TB in size, that 2 KB is needed for the other headers, it would appear that one can safely assume that the range HTTP header would not exceed 6 KB in size for up to 200 ranges.

110 Posts

June 13th, 2014 07:00

I would say that is a safe assumption (7K might be safe too; look at your requests in fiddler to find out).  6K should give you somewhere around 230-250 ranges, so 200 is even safer.  However (as always) be sure to handle error cases appropriately.

If you have more ranges to read,  you can obviously issue multiple requests and perhaps combine sequential ranges intelligently.

No Events found!

Top