Start a Conversation

Unsolved

This post is more than 5 years old

1559

August 10th, 2016 04:00

Centera writing times

Hello,

I would like to know what are the usual writing times in a Centera cluster (average times and maximum times for each 1000 files).

Assuming the files are small (between 20KB and 200KB), and only 2 or 3 metadata attributes.

I'm thinking in a 5 node cluster without stress (20 file writes per minute or less).

What more information is important?

Can anyone give me some data on this?

Best regards,

Nuno Feliciano

208 Posts

August 10th, 2016 07:00

Hello Nuno -

In a small cluster like that I would expect to see write times of under 1 second per file; you could perhaps see quite a bit better write latencies but I am trying to set the bar low.   If your write latencies are frequently greater than a second I would open a case with EMC support.


Regards,

Mike Horgan

August 10th, 2016 08:00

Hi Mike,

I understand your reply is intended to be first approach, a bit superficial.

I want to get more specific and detailed.

1s for average seems really high to me. Is that usual?

I want to know what is expected for average times, but also for maximum times (assuming a set of 100 or 1000 file writes).

I understand EMC support should be the place for this questions, but EMC support is being very evasive.

I suppose many people in this forum should have this kind of experience.

If anyone is kind enough to help us on this, we would be thankful.

Kind regards,

Nuno Feliciano

208 Posts

August 10th, 2016 10:00

Hi Nuno -

I don't work for EMC, so I do not have access to detailed performance information. I was just trying to give guidelines based on past experience.

In your original question you stated a load of 20 writes per minute, which would obviously be satisfied by a 1s average write latency. Are you asking what is the maximum expected throughput for writes?

Regards,

Mike Horgan

August 10th, 2016 11:00

Hi Mike,

Although I just mentioned the 20 writes per minute to make it simple and leave stress issues out, our concern is not about the throughput. Our concern is about the online operations. We don't want to make the user wait for several seconds for the upload of a document.

So, you have experience measuring this times? Can you give us estimated numbers for average and maximum times in a set of 1000 file writes?

Kind regards,

Nuno

August 10th, 2016 12:00

Hi Nuno

We've got a Centera Availability & Performance Service that runs on a number of systems. Typically a bit larger ones than just four nodes but I would expect the performance for a single write not to be too much different as it's a small write that will be performed in one thread.

The important thing you need to pay attention to is that you have a pool connection open and a configurable number of read and write threads within your application. That ensures your write will just need to tell an already existing thread within the process to pick up a file and write it to the Centera. 20 threads are a good starting point. If that is not enough, check the number of open connections (or contact me) to find out of you can further increase the number.

The typical time it takes our client to write a small file and get the ClipID back is somewhere around 400 to 600 milliseconds. You can expect that for all the parallel writes happening. Our client runs on systems with another 20+ environments connected to the same system. Rarely there are writes that take more than a second. It sometimes even takes five or more seconds for a given write. But that single one taking long has not lead to any complaints about performance from the application side.

We'd detect if the response time increases over a statistically abnormal level over a longer period of say two hours. Single writes taking long are nothing that applications have complained about (except the ones that use a single thread to archive all journaled emails continually :-).

Reads are normally much faster.

Bear in mind that time to first byte of a second is the aim. That is why parallel writes are so important.

Do not hesitate to get in touch in case of further questions. holger.jakob@informatio.ch

Best regards, Holger

208 Posts

August 10th, 2016 12:00

I completely agree with Holger's numbers based on my experience.

I'm not sure it makes a lot of sense to be designing a new software system in 2016 that writes directly from a user to the Centera API, but we'll leave that up to your discretion

Good luck,

Mike Horgan

August 11th, 2016 03:00

Thanks for your feedback, Holger.

What you describe may be similar to what we're experiencing. But when you say "It sometimes even takes five or more seconds for a given write", can you put a number on it? How many writes are above five seconds? 0,1%,1% or 5%?

Best Regards,

Nuno

August 11th, 2016 03:00

Thanks for your suggestion, Mike.

If Centera was consistently fast, for us it seems better to write the file to Centera in 1 step rather than doing a 2 step process, making it possible for us to leave files out of centera if we're not careful enough.

But if the Centera design doesn't make it possible to get consistently good times, we may think about that.

Regards,

Nuno

August 12th, 2016 04:00

Centera nodes are intended to process multiple threads. If applications use only one thread, you can expect app 2-4 files (20-200kB) per second to be written to Centera. Reads are up to ten fold faster.

Each ingest to Centera consists of one write of the data (=blob) and one write of the Metadata (system + user). For small files (<140kB) we suggest to used "embedding", where the first blob write is redirected to a metadata entry and only the  final write process of metadata (including the data) are required.

Each read needs also first a read of Metadata and subsequently a read of data. When embedded the second blob-read request is redirected to already read metadata avoiding another IO.

PS. Centera cluster come usually with an even number of nodes.

August 12th, 2016 06:00

Thank you for your feedback, Helmut.

So, in terms of average times, we're getting consistent feedback.

Holger says 400ms to 600 ms.

Helmut 250ms to 500ms

Mike agrres with Holger

We are getting around 500 ms (but really variable - sometimes the standard deviation can get up to 1s or even higher)

Although we would like it to be faster, for background operations we have a multi-threaded application. Our main concern are the online operations. Everyday we have several operations that take more than 10s. Besides making the user wait to much, we end up giving up with a timeout error.

If this is normal, we may consider giving up of using centera for online operations. But if it is not normal, we want to make it right.

Can any one give us a gross number of how many high times (higher than 5s or so) we're expected to get? 0,1%,1% or 5%?

Kind regards,

Nuno

No Events found!

Top