Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Jason,

There is definitely a space for ScaleIO for SAP HANA.

ScaleIO as a software is absolutely viable, but you need to understand how it plays in the HANA stack and in each organization's datacenter strategy.

First, ScaleIO by being a software only solution, will always need hardware to be ran on. In that sense, it is not required to be certified by SAP under the TDI program. What needs to be certified if the hardware components the SAP HANA software will run on.

So, as you can see from SAP Note 800326, SAP looks at ScaleIO as one OS component that can be used in conjunction with SAP HANA software. Here I must highlight that the critical component in terms of performance is the hardware configuration that SAP HANA and ScaleIO will be running on, as our engineering lab testing proved that ScaleIO can deliver against the most demanding performance KPIs as long as the underlying infrastructure is properly designed. We have a couple of whitepapers (this one being one example) documenting the conclusions of this engineering testing.

Second, the ScaleIO discussion is all about a different datacenter architecture strategy, as the idea behind ScaleIO is to enable its usage on top of low cost servers, providing s virtual SAN with elastic capabilities, and the more nodes you have on the ScaleIO cluster, the better it will perform. We see this being requested mainly by Service Providers, where I would say such architectures observe a lot of interest. Putting ScaleIO for a single SAP HANA system might not be the best choice, since ScaleIO needs a minimum of 3 servers in the cluster to work, and its sweet spot is from 10 to 20 servers onwards.

Does this answer your question?

0 Kudos
Astone422
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Jason,

You ask is there customer interest.  As you know the SAP market is made up of SAN storage today.  So the SAP databases are running on that SAN storage.  I see ScaleIO as a greenfield need coming especially when a net new environment is be implemented as with a new SP or cloud offer.  We have it supported for HANA after resolving some early issues with the HA required scsi 3 reservations.  I will be interested as the market matures.  For now expecting only a few.  Glad we are ahead of the curve here.

0 Kudos
VMARK11
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Hi folks.  Two part question.  First, in TDI scenarios, are both Cisco and Brocade SAN switches viable choices?  The specific scenario is SGI scale-up with VNX.  Second question, so long as EMC TDI recommendations are adhered to, are there any major objections to hosting PRD, QAS, and DEV on the same VNX array?

0 Kudos
VMARK11
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

In HANA Failover HA scenarios (not HANA System Replication), is there a rule of thumb for failover time per GB or TB of HANA resident in memory?

0 Kudos
Astone422
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Let me take a stab.

For supported switches there is no SAP site that certifies and reports them.  Per the SAP HANA TDI FAQ:

"Not all combinations of network interface cards and switches work as expected. Therefore, before making the decision for a certain combination, customers must contact the server vendor and inquire which switches are supported for the network interface cards of the given server."

So Cisco and Brocade have hardware that does work and supported on certain servers.  Best to check with server vendor.  To make a server sale, support is something that can be negotiated with server vendor.

As far as sharing test and production on the same VNX there is no issue as long as recommendations are followed.

As far as rule of thumb for normal HANA HA to a local standby node that is a great question.  Since the standby node is up and running, there is only the short scsi reservation time plus the"lazy"  load of required tables into memory.  I have my rule of thumb but like any rule of thumb is probably wrong 99% of the time. Based on what I have seen, HANA restart for a 512GB node is around 10-15 minutes, I would divide the entire HANA per node memory in GB by 512gb and multiply by 10-15 minutes.  So a SGI 12TB would be 24 * 10 minutes  to 24 * 15 minutes.  Add 10% fudge factor to err on high side and account for any other times.  This assumes the low end 10K drives and only using so many HBA's, FA's, etc.  15K and SDD drives or more paths would expect better times,  I have no rule of thumb for those yet

Hope this helps

Allan

0 Kudos
VMARK11
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Thank you Allan!  Sounds like a good place to start on the failover estimation.

0 Kudos

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Mark,

In regards to the reload time, now with the current version of SAP HANA that is SPS10, some things have changed.

As Allan mentioned, there is always the lazy load of the column store which does not interfere in terms of RTO, as the system will be available (although with degraded performance until the key objects are reloaded in memory).

So, when I see questions related with fail-over time, it always means what is the expected RTO on each HA scenario.

And here there are quite some different options.

HA-DR-DP options.jpg

The picture above gives you an overview of all possibilities.

So, we can make this discussion in two ways: discuss what is the optimal RTO we can expect for each of these scenarios; or what is the right solution depending on the customer specific requirements.

Defining what is the expected RTO for each of the High Availability solutions above is always dangerous, because it depends on quite some variables.

So, in the hope of answering your question, let me take one example.

If you have one SAP HANA system consisting of a single node of 6 TB, possible RTOs for the different applicable solutions could be something like:

  • SAP HANA System Replication across 2 VMs, using a clustering software like the SLES Failover Cluster:
    • SAP HANA System Replication has multiple configuration options, but HA purposes I would assume the following as usable:
      • Synchronous: The write is successful once it is acknowledged on the secondary system, but is not "pre-loaded in memory"
      • Synchronous in-memory: same as before but, all data is pre-loaded in memory on the secondary node;
      • Full Sync: is an additional parameter to the "sync" options, where the write is only acknowledged once it is written on the LOG volume of the secondary system. No impact on RTO, but ensures a zero RPO.
    • For sync: you have to account for about 30 seconds to 1 minute of failover time (as SAP HANA has a default time out to declare a failure of 30 seconds), then you would need to load the "minimum needed tables" for th instance to be accessible again. I'll discuss further ahead what is the "minimum tables needed".
    • For sync in-memory, there the RTO is just the 30 seconds time-out of HANA plus the DNS redirect, so could be in the order of less than 1 minute.
  • SAP HANA Host Auto-failover: no need for clustering software as HANA makes that work of orchestrating the failover.
    • In this scenario, the secondary system would have the HANA services started, but no data and no activity.
    • The failover mechanism would be managed by HANA and would imply the "faiover of the persistency" plus the load of the "minimum tables necessary" for the system to become available.
    • So, the times would be:
      • The time for the SAP HANA Secondary NameServer to take control (few seconds) + trigger the SCSI persistency reservations clean-up, rewrite, and mount of LUNs on the secondary server (few seconds more) + recovery of the database (can go from seconds to minutes depending on the volume of pending transactions to be rolled-forward or rolled-back) + load of "minimum number of tables" necessary for the system to come online.

So, both in the case of HANA System Replication in "sync mode" without pre-load, and HANA host auto-failover, you would need to load a minimum number of tables for the system to become available.

Those tables are system tables (can be column or row) and all the other ROW tables. The problem in previous versions of HANA was that the ROW tables, while being reloaded, also triggered the rebuild of all its "in-memory indexes", which made the whole process to take quite some time. I heard values like 2 GB for second, but you need to remember that this number depends a lot on the number and speed of the HBAs on the server, the number and speed of the CPUs on the server, the configured front-end connectivity on the storage, and these are just few examples. Meaning this number can be higher or lower depending on each customer case.

Now with SAP HANA SPS10, this rebuild of the indexes is triggered asynchronously after the basic tables are loaded and the instance comes online, and I've heard comments on 6 TB systems coming up in times like 2 minutes. Of course the size of the data to be reloaded will also play a great role here, and the fact that the system has 6 TB doesn't mean it has a very big or small row store as that depends on each customer system. This means that the 2GB/second no longer makes sense, and I don't have yet number on SPS10 from customers.

Final aspect: if you think on a scenario of storage replication, you need to add to the scenario I described above of "HANA Host Auto-failover" the boot time of the system, which can be considerable for a physical server, and can be under 3 minutes for a virtual machine.

Conclusion: for a scenario of SAP HANA Host Auto-Failover, I believe it is reasonable - with SAP HANA SPS10, for a typical 6TB node - to expect failover times between 2 and 5 minutes.

So, I cannot give you a "rule of thumb" as this changes a lot with version, and system topology, but hope this explanation helps you understand the variables involved.

The customer performing good "house keeping" work in reducing the volume of data on the ROW store, avoiding creating ROW tables unless absolutely necessary, avoid marking column tables as pre-load unless absolutely necessary, are all mechanisms that can help them keep the RTO very low with a clean and simple solution such as SAP HANA Host Auto-failover or Storage replication.

I believe that the key thing is to have the system available again as fast as possible, even assuming that not all data will be in memory, as it will be loaded in a "lazy mode", and so assuming as well that the performance won't be as good while the whole "working data" is back in RAM.

If the question goes to "how much time to load my whole data back to RAM and have 100% performance", I have no better number's than what Allan has provided, although I know Allan's numbers are previous to SPS10, and I'll be curious to see how that looks with these new optimizations of the ROW store load SAP came up on this latest version.

Let me know if this helps, ok?

0 Kudos
onx1
1 Copper

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Hi, Is there any rules around running SAP HANA in an infrastructure partially shared with a hosting/cloud provider? Does TDI extends to cloud provider which resources at some layers like compute may be shared with their customers? Or there is other certification around for them to obtain before they could win our business?

Thanks.

Shu

0 Kudos

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

For production the same rules apply even if virtualized on TDI, i.e. dedicated host for the Compute.

Also the same rules apply for the shared storage, e.g. number of production nodes for the model concerned.

0 Kudos
VMARK11
1 Nickel

Re: Ask the Expert: SAP HANA TDI with EMC - Implementation, Rules and Details

Antonio, this is spot on, I appreciate it!

0 Kudos