Start a Conversation

Unsolved

This post is more than 5 years old

J

2871

June 3rd, 2013 04:00

Vplex Metro scsi-2 reservation issue in VMware MSCS environment

Hi All,

We upgraded a Vplex metro config to GeoSynchrony 5.1P3 (5.1.0.03.00.07).  Since then the enviroment ran stable.

However last weeks issue arised after a ESX-reboot (due to HBA-replacement): Windows MSCS-cluster-VM’s, which did not want to start again after the ESX-host reboot..

My suspection at this moment is that the VPLEX-upgrade to 5.1P3 has led to a change in behavior towards MSCS cluster VMs, which causes the dead paths.

 

What happened?

During normal operations the second node of the Windows2003-MSCS clusters (B-nodes) ware being shut down.

  1. Maintenance on the server was performed (defective HBA replaced and zoning adapted to those changes), and the ESX-server was rebooted
  2. All VMFS’s come up normally, but MSCS cluster nodes cannot start. Cause is that RDM-luns’s have all paths dead.
  3. A reboot/rescan of the ESX-host does not help
  4. A check of zoning, VPlex configuration etc does notindicate any issues at those layers. (EMC support involved, no Plex-config-issues or errors found)

The current state is that in the MSCS clusters still only 1 node is active on the primary location (A-nodes), and the second node still unable to start due to dead paths.

Further investigation proved that the issue only occurs with RDM-luns used by Windows2003 clusters. The issue does not occur on RDM-luns used by Windows2008 clusters and on RDM-luns not used for MSCS.

The difference what I see at VPLEX-level is that the RDM-luns used for Win2003 have SCSI-2 locks active, and the RDM’s in use by Win2008 clusters have SCSI-3 locks.

My feeling now is that VPLEX behavior for SCSI-2 reservations has been changed towards 5.1SP3 in a way that when a SCSI-2 lock is active on a lun the metadata of those luns cannot be read either by other hosts, causing the dead paths at ESX-level.

Does anybody have any suggestion or idea if there is a parameter/option at VPLEX level which controls how SCSI-2 reservations are being enforced at VPLEX-level, and if we can change a parameter there to get this issue resolved?

Thanks in advance for your thoughts on this.

Vplex-config:

SiteA: Vplex-engine (2dirs), VNX5500, ESX5.0 host with MSCS-nodes-A.

SiteB: Vplex-engine (2dirs), VNX5500, ESX5.0 host with MSCS-nodes-B.

Each site dedicated FC-SAN. VPLEX interconnect 10GbE IP.

30 Posts

February 5th, 2015 07:00

Ever find a definitive answer to this? I have some unusual scsi reservation activity causing path downs on vsphere after upgrading from 5.0 to 5.3

Only trying to add new vplex storage to the metrocluster and had major problems.

Thanks!

14 Posts

April 23rd, 2015 02:00

perhaps this article has something to do with it, though this is just during upgrade.  KB#85130

Issue:

During upgrade from GeoSynchrony 4.2 to 5.0 (although this can happen during any upgrade), connectivity loss to VPLEX volumes occurs during first upgrade actions.

Cause:

ESXi servers, hosting Microsoft Windows Cluster 2008 nodes, due to their use of RDMP and NMP devices, did not, and could not, handle the loss of their paths (and loss of SCSI-3 reservations) during the upgrade.

This is due to how both MSCS 2008 and ESXi handle the initial disk reservations when the cluster starts up. Using ESXi's NMP, the cluster only sees one path on which to place the SCSI-3 (PER) reservations, thus locking those disks to that cluster, down that single/primary path. When that path failed during the VPLEX NDU, ESXi did fail the I/O over to the secondary path on the server, but MSCS was unable to establish new reservations to the disks down the second path, due to VPLEX locking out any reservation modifications during the upgrade operation on the VPLEX directors. This inability to put new reservations on the clustered disks, caused the MSCS cluster nodes to panic and brought down the entire cluster.

Resolution:

Final result of the analysis has shown that the optimal solution is to install current version of PowerPath/VE on the ESXi servers that host MSCS cluster nodes. This permits the SCSI-3 disk reservations to be applied down all active paths to storage. This will prevent any issues during future NDU events.

No Events found!

Top