Anonymous
Not applicable

RTO for Oracle DB - with crash consistent replication

Jump to solution

Hi,

we need to tell our prospect the expected RTO for their Oracle DBs.

The main production DB is 16TB.

DR will be based on storage replication (RP or SRDF/A).

Is there a rule of thumb to calculate the RTO with crash consitent images?

What are the most important variables to get a reliable estimate?

Thanks for any help.

Regards

Jo

Labels (1)
0 Kudos
1 Solution

Accepted Solutions
3 Argentum

Re: RTO for Oracle DB - with crash consistent replication

Jump to solution

Jo:

The best source for this that I have been able to find is here:

Configuring Instance Recovery Performance

This Oracle document defines how to set a target to allow crash recovery and media recovery to occur within a fixed period of time, using a configurable parameter known as FAST_START_MTTR_TARGET. MTTR stands for mean time to recovery, which in Oracle parlance is essentially identical to what you are referring to as RTO. Note the term "mean", though. The MTTR will be somewhat variable depending on the exact situation at the moment the crash occurs. FAST_START_MTTR_TARGET essentially establishes a ceiling (in the document they used 90 seconds) which MTTR is never supposed to exceed.

Oracle does this by controlling checkpoints and such to limit the number of transactions that must be applied to the datafiles in the event of a crash recovery. There is a performance overhead to doing this, though, as more checkpoints will create additional IOPS. There is even a formula (contained in the section Calculating Performance Overhead) which allows you to estimate the performance hit for a particular setting of FAST_START_MTTR_TARGET.

The method to estimate MTTR is contained in the section called Monitoring Estimated MTTR: Example Scenario.

You can also create a parallel recovery configuration if you want to further tune MTTR.

Finally, there is actually a feature in Oracle called MTTR Advisor which will run a simulated crash recovery and tell you how long that will take.

Let me know if this helps.

Regards,

Jeff

View solution in original post

0 Kudos
4 Replies
3 Argentum

Re: RTO for Oracle DB - with crash consistent replication

Jump to solution

Jo:

Annoyingly, I have to give you the classic, DBA-style, "it depends" answer. Bottom line: The RTO is a tunable, depending on how asynchronous you are.

You should bear in mind that a crash consistent image has a risk of not being able to run media recovery (always supported with a backup). There is an MOS article on this which I am not able to find at the moment.

The push/pull is that the RTO depends on how long it takes for you to run a media recovery (plus the crash startup time but that's fairly constant). The media recovery is the point where you apply archived logs. That time depends upon the number of archived logs which must be applied, which is dependent upon:

  • The data change rate.
  • The point at which your snap image was taken (i.e. for a full recovery, a snap taken 24 hours ago will take longer to recover than one taken 5 minutes ago.

Thus, RTO can change dramatically depending upon how it is set up. Having an extremely short RTO would require a very frequent schedule for taking images, which can lead to performance, capacity, etc., issues as well. So it is a trade-off.

Regards,

Jeff

Anonymous
Not applicable

Re: RTO for Oracle DB - with crash consistent replication

Jump to solution

Hey Jeff,

thans for your valuable input.

with RDF/A the DR image is crash consistent and has an RPO say 30 seconds.

So we will have crash startup time + media recovery time which should be fairly quick for such low RPO.


Assuming a change rate of 100 Mbyte/s and a DB size auf 10TB.

Can we calcualte and estimated RTO?

Thanks

Jo

0 Kudos

Re: RTO for Oracle DB - with crash consistent replication

Jump to solution

I think with SRDF / A NEED TO build scripts to remove the base of the archive and to define a consistent copy cycle.

As you will know, any copy containing control files and redo systems recover from lifting structures, but does not allow you to consistently identify your RPO.

The RTO depends on the application of the archive respect your window of asynchrony. The process is particularly simple and have a very good document about h2603-oracle-db-emc-symmetrix-stor-sys-wp-ldv I hope will serve you, it is very interesting.

I think that would be useful to put together at least two separate base srdf archive groups.

Exitos

3 Argentum

Re: RTO for Oracle DB - with crash consistent replication

Jump to solution

Jo:

The best source for this that I have been able to find is here:

Configuring Instance Recovery Performance

This Oracle document defines how to set a target to allow crash recovery and media recovery to occur within a fixed period of time, using a configurable parameter known as FAST_START_MTTR_TARGET. MTTR stands for mean time to recovery, which in Oracle parlance is essentially identical to what you are referring to as RTO. Note the term "mean", though. The MTTR will be somewhat variable depending on the exact situation at the moment the crash occurs. FAST_START_MTTR_TARGET essentially establishes a ceiling (in the document they used 90 seconds) which MTTR is never supposed to exceed.

Oracle does this by controlling checkpoints and such to limit the number of transactions that must be applied to the datafiles in the event of a crash recovery. There is a performance overhead to doing this, though, as more checkpoints will create additional IOPS. There is even a formula (contained in the section Calculating Performance Overhead) which allows you to estimate the performance hit for a particular setting of FAST_START_MTTR_TARGET.

The method to estimate MTTR is contained in the section called Monitoring Estimated MTTR: Example Scenario.

You can also create a parallel recovery configuration if you want to further tune MTTR.

Finally, there is actually a feature in Oracle called MTTR Advisor which will run a simulated crash recovery and tell you how long that will take.

Let me know if this helps.

Regards,

Jeff

View solution in original post

0 Kudos