nagelp

6 Posts

2617

October 27th, 2014 03:00

Which HA/DR solution for SQL 2014

Hi all,

Given an environment with SRDF/CE in conjunction with Windows Server Failover Clustering to build an HA solution for their SQL 2012 / 2014 environment. The question has been raised why not using Cluster Shared Volumes with SQL 2014 instead? [1]

From my understanding, CSV basically allows to access the DB files on a LUN also from the passive node (read only), which makes the failover easier. Cluster Enabler does basically the same in an automated way, making R2 device R/W accessible.

Anyway, there are plans to move to a converged infrastructure. So, one Vblock per DC with a VMAX for the storage layer. The goal is to only use VPLEX for any replication and application HA/DR purpose. SRDF is not officially supported with a Vblock anyway.

This raises the question how to implement HA for their SQL clusters in that environment, leveraging VPLEX and preferably virtualize everything on VMware ESXi (streched clusters across sites)? I read the white paper "HA for extreme performance of MS SQL Server" which implements Microsoft AlwaysOn Failover Cluster Instances (FCI) on distributed VPLEX volumes. Seems to be a pretty neat solution. How does this solution compares to using CSV? What are the differences i.e. advantages/disadvantages?

The goal must be to leverage VPLEX and virutalize the SQL environment (same technology and concept for all application clusters on the Vblock).

Any input is highly appreciated.

Patrik

[1] Cluster Shared Volume (CSV) Inside Out - Clustering and High-Availability - Site Home - MSDN Blogs

Responses(4)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

October 27th, 2014 05:00

What a great topic you've posted here! And you've framed the questions and scenarios beautifully.

To restate, you have two potential solutions to pick from:

1. Implement CSVs in a stretched cluster with VPLEX replicating the distributed volume. Either of the two SQL Server instance would be able to access the database files at either side and start the databases. In this case, we need to address what will be presenting the CSV? Will it be a pair of EMC Gateways? Will it be a Windows 2012R2 Cluster? Whatever the the choice is, know that it will create an additional layer of complexity in your IO chain.

2. A single volume or set of volumes that is directly managed by the SQL AlwaysOn FCI. This is a direct IO pathway to the volumes that the SQL instance will be using. No middle layer.

Conclusions: if your Vblock were running Hyper-V, the CSV option might be an interesting choice. If the Vblock will be managed by vSphere, I can see a bias toward the FCI route.

Jim Cordes

nagelp

6 Posts

0

October 27th, 2014 10:00

Hi Jim

Thank you very much for outlining the two possible solutions. Indeed an interesting topic but no so easy to fully understand just from reading a couple of white papers

Can you explain in more detail what that additional layer for the CSV option is needed for? There seems to be a SMB server/share requirement since CSV uses SMB protocol to communicate (metadata) with the coordinator node owing the shared device. So, is that the reason one would need another Windows 2012 R2 cluster? I've never heard of an EMC Gateway?

Does both options provide the same level of availability? During a failover, the database recovery and client re-connection in the SQL AlwaysOn FCI option does take a minute or so, right? Would the behavior be different for the CSV option? The MS documentation indicates that CSV routes traffic over the network (SMB) allowing the SQL instance to remain operational - so, no SQL server instance failover in case the storage has failed on one site. [1]

Thanks,

Patrik

[1] If one wants all traffic crossing sites over the ISL for a longer time of period is another question though

patrickduggan

13 Posts

1

October 27th, 2014 11:00

One advantage of CSV is that standard Failover Cluster disks are only active-passive. With CSV, failover times are reduced in part by not requiring a change in drive ownership, dismounting or remounting.

One distinction needs to be made, however. It is my understanding that SOFS for Application data must be used on top of CSV + NTFS (called CSVFS), as the Continuous Availability bit can then be set. Once set, CA further reduces failover time. This is the flavor of CSV that SQL Server 2012+ supports.

This is determined at the time of share creation. You either select "General Use" (not supported for SQL), or Scale-Out for Applications (SOFS).

You are correct in that if a link issue existed that prevented access to disk from a SQL node that the cluster network can/would be used as a redirected access method to disk. That capability is inherent in CSV functionality. One could see how this might prove problematic in some scenarios (one might prefer to instead fail over to insure that a host is not attempting all IO operations over a WAN link).

This can be created with Windows Server hosts using SAN storage. Alternatively our Converged VNX arrays have data movers - NAS heads - that support SMB 3 with the CA bits required for SQL support. These might be used in place of Windows hosts for the SMB 3 CA presentation.

When an application that has open files on a CIFS server is failed over to another node in the cluster, or a VNX data over fails over, the application attempts to re-open those files. Prior to SMB CA, the CIFS server would see this as a different client trying to access the locked files and rejects the request. SMB 3.0 includes a unique identifier called an Application ID, which is used to indicate to the server that it is the same application, to shorten failover times that way.

I think this is what James was referring to.

nagelp

6 Posts

0

October 28th, 2014 05:00

Alright, thank you very much for the clarification. I got it now what was exactly meant with the additional layer of complexity.

Thanks again to both of you for the detailed explanations.

Cheers,

Patrik

View All

No Events found!