Highlighted
KTelep
2 Bronze

Issues with MSCS in Clariion Environment (Extremely Long Failovers)

We've been around the block multiple times with EMC Support and Microsoft Support and it seems there's a lot of finger pointing between the two orgs on our issue and we've been getting nowhere with either vendor.

I'm curious if anyone has experienced this:

We occasionally have EXTREMELY long failovers on our MSCS clusters attached to both our CX3-80 and CX4-960.

Host OS is Win2k3 64-bit

PowerPath is 5.5

Emulex HBA Drivers are latest and greatest

Switches are Cisco running SAN-OS 3.3(4a)

CX3 is running Flare 28, CX4 is running Flare 29

What we see is that during a failover our larger LUNs (2Tb) will go Online Pending for an extended period of time, and then randomly pop after 45 minutes or so and come up.  If we shut down all nodes of the cluster, then just try to bring the one node up, we get the same behavior, so I don't think we're having SCSI reservation conflicts.

Removing the host/disks from teh storage group and readding them also has no impact.  We just have to "wait it out" for MSCS to decide to bring the disks online.

Has anyone else EVER run into this situation before that can provide us a direction to continue working/researching?

Labels (1)
0 Kudos
14 Replies
RRR
5 Osmium

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

Sorry, I don’t have any experience with Mirrorview/CE just yet.

0 Kudos
dynamox
6 Thallium

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

so you cluster nodes are attached to both arrays ? Do you experience this issue if your cluster is attached to only one of the arrays ? Disks are formatted GPT ?

0 Kudos

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

Are you using Cluster Enabler to cluster across both Arrays? Or do you have separate clusters, isolated to individual arrays?

0 Kudos
sasamir1
3 Argentium

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

Flare 28 on CX3? please can you check again.

regards,

Samir

0 Kudos
KTelep
2 Bronze

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

This is what happens when you post after working an issue for waaaay two long

Two separate clusters, running on two separate Clariions.   No mirrorview between them, no MirrorView/CE.  Simplist cluster config imaginable.

CX3-80 is running Flare 26

0 Kudos
dynamox
6 Thallium

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

cluster nodes zoned identically ? (each HBA is zoned to SPA and SPB).  Failover mode on Clariion is set to 1 ?

0 Kudos
KTelep
2 Bronze

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

Yep, all the zoning is correct for all nodes of the clusters, Failover mode is set properly, and validated.

0 Kudos

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

What is the NTFS Allocation size for the large LUNs? 4KB(default)? Or something else?

Richard J Anderson

0 Kudos
christopher_ime
4 Beryllium

Re: Issues with MSCS in Clariion Environment (Extremely Long Failovers)

Could you run the following PowerPath command on all nodes in the cluster?

emcphostid check

I'm not entirely certain of the exact behavior when there are collisions, but I do know to look for it in a clustered environment when installing PowerPath so figured it would be worth a quitck look.  If there are nodes with the same hostid entry this of course would only be in a situation where the systems were created from a cloned disk image.  There is a procedure to change it using the set parameter.

0 Kudos