EdwinVanMierlo
2 Bronze

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

All,

Please submit all your questions, about:

- Cluster Enabler multi-site clusters (SRDF, MirrorView, RecoverPoint)

- and any other Windows Server Failover Cluster on EMC storage questions

Thanks,

Edwin.

0 Kudos
EdwinVanMierlo
2 Bronze

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

All,

let me detail one of the many questions which come in from time to time.

Many times I have been asked to provide an explanation because:
"Cluster Enabler failed over the cluster groups"

Lets start with a statement: Cluster Enabler will not make any decisions (one exception to this), and will not cause the cluster groups to failover (one exception to this), nor will it actually actively failover any cluster groups.

Besides the exceptions noted, which I will come back to further down. Cluster Enabler is merely a resource in the cluster, and will follow the groups and cluster where they want to "go".

Microsoft Windows Server Failover Cluster is in control, all the normal rules for cluster failover are honoured, and cluster will determine the failover based on the set error detecting properties of cluster itself.

Cluster still behaves as if it was a normal cluster, it will perform error detecting, and the following three methods are used:

  • Network error detecting; this is what we used to call the "heartbeat", it will detect if a node is up or down in the cluster, it will cause a revote (regroup) for quorum and it will take appropriate action.
  • Resource error detecting; through the methods of "Looksalive" and "IsAlive" the cluster checks the health of each and any resources in the cluster groups, and will detect failures.
  • User mode Hang detection; based on a "heartbeat" between the NetFT.sys network driver and the Clussvc.exe Cluster service, it can detect if the node is hanging or not.

Based on the error detection routines within cluster, it is cluster to determines:

  • do we need to failover ?
  • where do we need to failover to ?

As cluster is a highly customisable engine itself, there are many factors which influences the failover decision of cluster. Here are the most common ones:

  • Per Group: the "Group Failover Threshold"; for each group you can set a threshold and period which determines how many times the group can failover due to failures in a given period. This setting is to prevent a group from "failing over continuously" when there is a problem
  • Per Group: "AutoStart / Priority"; this setting will determine if a group will failover or not; if the Priority (or AutoStart) is set to "Do not AutoStart" then the group will not failover
  • Per Group: "Preffered Owners" this setting will determine the next node in line to failover to (advisory only)
  • Per Group: "AntiAffinityClassNames" this setting will determine if a group will not come online if another group with similar name is already on the target node. It is used to keep "similar applications apart" (advisory only, cluster may overwrite this setting)

  • Per Resource: "Resource Restart" property; if the resource property is set to be "do not restart" then the cluster will not affect this group and will not failover the group
  • Per Resource: "Persistent State" property; if the resource was brought offline manually, and if the group is failing over to another node, the resource will not come online on the other node, as it was already offline (and therefore persistent) to begin with.
  • Per Resource: "Possible Owners"; prior to failing over to a node, cluster will enumerate all resources in the group and determines if all resources are "possible" to come online on that node, if one (or more) resources are not able to come online due to this "Possible Owners" setting, then cluster will not move the group to that node.

All this is done, the error detecting, the decision to failover (or not), and where to move to, by cluster.
Cluster Enabler does not come into play in this logic, it is merely a resource in the cluster.

Hope this clears up the situation of "Cluster Enabler failed over the group", the answer for that is: Cluster detected an error, and Cluster failed over the group. That will give you a good start when looking at those situations.

I mentioned 2 exceptions:

  • Cluster Enabler does not cause a cluster to failover; the one exception is that Cluster Enabler is a resource in the cluster, therefore has an "LooksAlive" and "IsAlive" check. While the checks we are doing are quite basic (just checking connectivity to the frame) it is possible that this is detecting a failure and the resource fails. While this fails, you probably notice that other resources will be failing the same time, as you will have a connectivity problem; so disk resources will be failing as well at this point.
  • Cluster Enabler does not make any decisions: the one exception to this is: at the very moment the resource is coming online the Cluster Enabler software checks the replication status of the "link" (could be RDF, MirrorView or RP) and if the "link" is not replicating/up then Cluster Enabler resource will make the decision not to come online. This is to protect the data, but can be configured. (I will do another post detailing this)

And there you have it, failover of cluster in a multi-site cluster using Cluster Enabler.
It is basically and not much more than any other cluster !
All the normal cluster logic applies

Please let me know if you have any questions
Rgds,
Edwin.

RobertoAraujo1
3 Cadmium

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

Great discussion, everyone! Help us promote this debate. Please, share the following tweet:

Join the discussion NOW! Ask the Expert: EMC Cluster Enabler multi-site clustering. http://bit.ly/1jLn8wq 4/9 - 5/2 #EMCATE


Thanks!

0 Kudos
Highlighted

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

Hi Edwin,

Great session so far.

My question is on SRDF / CE with a (old/er) DMX  and new VMAX arrays. With a disk expansion for a disk that is already SRDF-ed,  I think the question of active/ passive comes in here, especially with the DMX (if i remember correctly), that the cluster needs to be brought down on one node - then the disk expanded (and BCV/ etc if its striped meta) and then re-presented during which the cluster is failed over, resources brought up, etc.

Is the setup still considered active/ passive since the node needs to be brought down?

Also curious to know if this case still persists with the VMAX (has been some time that I worked on SRDF/CE on the VMAX) and any improvements on the same.

Thanks

KC

0 Kudos
EdwinVanMierlo
2 Bronze

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

Hi KC,

A very good question.

To extend an volume in a SRDF/CE cluster, it is a quite a simple process, and this process is not different if you are running on VMAX or the older DMX.

The process is described in https://support.emc.com/kb/15140

  1. Open the Cluster Enabler GUI and select the group that contains the device you wish to expand. Choose More Actions > Deconfigure CE Group. This is done to avoid potential issues while the mirrors are being updated.
  2. Execute the RDF deletepair.
  3. Expand both metas (R1 and R2).
  4. Execute the RDF createpair.
  5. Full establish the RDF pair.
  6. Open Cluster Enabler and run the Configure CE wizard. This should successfully convert the group back to an SRDF/CE configuration.
  7. Test failover.

After the above processes, the cluster should pick up the extended device, after which you need to extend the partition and NTFS volume in the host.


If the Windows OS is not picking up the larger space on the devices, the nodes of the cluster might require a reboot.

HTH,
Edwin.

poodle1
1 Copper

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

Hi Experts,

Need some urgent help with a SAN migration for windows 2008 clusters that use the EMC cluster enabler plugin. We're migrating SAN disks for W2k8 clusters (clusters use file share witness) from EMC CX4 to VNX using SAN copy. The EMC components we're running on the cluster nodes are listed below.

EMC cluster enabler mirrorview plug-in 4.1.0.22

EMC powerpath 5.5

EMC solutions enabler 7.3.0

EMC cluster enabler base component 4.1.0

There's 2 sites that use a stretched VLAN - both sites are in the same location. 2 node W2k8 cluster with 1 node in each site, each site has a CX4 and VNX array using mirrorview/s replication.

Would appreciate if you could help with the SAN disk migration procedure - more from the EMC cluster enabler perspective. Thank you.

Here's the high Level steps we plan to follow however for the EMC cluster enabler - I think we only need to add the new LUN in the existing resource group for the service (cluster resource group for DHCP for example) in cluster enabler manager console, delete the old LUN from enabler Manager and make sure the EMC enabler resource in cluster management console is added as a dependency for the new disk.

  • Establish SAN COPY
  • Wait for the Volume to be in SYNC
  • Prepare the Zoning for the Server to the new Storage
  • Ensure that the backup is completed
  • Map the new VNX volumes to the server
  • Verify the SYNC Status and split the mirror
  • Put the resource for the service offline in failover cluster manager
  • Add the new disks to cluster
  • Remove drive letter from old volume or assign any other
  • Assign drive letter of the old volume to the new volume
  • Remove the old volume from the cluster management console
  • Add new volume in service group in EMC enabler manager on each node including fileshare witness server
  • Delete old volume service group in EMC enabler manager on each node and FSW server
  • Make sure the new disk is listed as a dependency for the clustered service
  • Make sure EMC enabler resource is listed as a dependency for the new volume in failover cluster console
  • Put the resource for the service online in failover cluster manager
  • Unmap the old volumes from the servers
  • Restart the server and check whether OS and services start correctly
0 Kudos
Sushma_M1
1 Nickel

Re: Ask the Expert: EMC Cluster Enabler multi-site clustering

Hi Edwin,

I am new to SRDF/ CE and we have cluster enabler in our environment. 2 weeks back we had DR testing and CE did not failover few of the devices and they write disabled.

After checking we got to know that those devices were not added to cluster group.  Now that we need to add those devices to the cluster group and would like to know the procedure. I was given below kb article but that is light confusing.

Since they are existing devices and mount point are already created and I want to know if we can add the disks without un-mounting the drives. Since critical application is running we don't get long outage window. Please help me with this.

We have windows 2008 server.

The article which I referred was:

0 Kudos