Unsolved
This post is more than 5 years old
3 Posts
0
2608
January 18th, 2013 22:00
Verify multipathing
We recently had a failure on one of our Service Processors and one LUN did not fail over to the secondary LUN. The problem was solved by reseting the SP. After this I have done a general check on the vShpere 4.1 iSCSI configuration and could not find anything wrong. On the VMware side of things I can see 4 paths to each datastore with two being active (to the respective Active SP for that virtual disk group). I switched one virtual disk access to another SP and did not notice any interruption on the VMware side of things. I also confirmed that the correct paths are active. Is there any other way to double check if SP redundancy works from the VMWARE side of things except next to actually shutting down one of the SPs?



christopher_ime
4 Operator
•
2K Posts
1
January 19th, 2013 00:00
BerLip,
Firstly welcome to the forums and thank you for being an EMC customer.
Yes, you can disable individual paths as noted in the following KB article:
http://kb.vmware.com/kb/2000552
Skip to the section "Enable or disable path" for the version of ESX/ESXi that you are running. As you will notice as you follow the steps, it is granular at the LUN level. This allows you to disable all paths for a LUN to, for example, SP A.
You didn't mention the array or more relevant the version of FLARE that is running, but keep in mind though, assuming you have an array that supports ALUA (ESX/ESXi v4.x is when VMware supported it) and the environment is configured as such, since the current owning SP for that LUN will still be online and just the optimal/direct paths to the SP will be disabled, the LUN will not immediately trespass. The upper redirector in the array will pass the I/O that is now being sent by the host via the non-optimal/in-direct paths to the peer SP (SP B in this example) and as the name suggests, redirects the I/O to the current owning SP (SP A) via the CMI which is a channel between SP's. However, after 128,000 I/O's as per the specification, it will do an implicit trespass of the LUN as it will recognize that the optimal paths to the then current owner have been down for an extended period of time and return to a configuration that is a single path to the new owner (now SP B in this example).
Then again, if you don't have a version of FLARE that supports ALUA and instead using failover mode of 1 (PNR), then the comments above don't apply, and the LUN will/should trespass shortly after you disable the paths from the host. One other thing to keep in mind, this method of disabling the paths also immediately triggers the DEAD PATH response of course whereas it would otherwise take in consideration the RecoveryTimeout, NoopTimeout, and NoopInterval iSCSI Advanced Settings before doing the same.
HTH
BerLip
3 Posts
0
January 19th, 2013 22:00
Thanks - I did the path disbabling already on one of the test LUNs in preparartion of the more controlled test, which will also include physical tests like pulling the SP to simulate power or hardware failure. I just thought there would be something more formal for this type of testing and verification.
I am running on both units FLARE version 02.23.050.5.710. I noticed that each LUN has 4 paths to it from the VMWARE side of things. SPA has 2x and SPB has 2x - both active paths are going to one SP always while the other SP has standby paths. The path selection policy is Most Recently Used. What bugs me compared to other storage units is that the two active paths only one carries actually I/O as per VMware storage path details. Furthermore should the pathing not be setup in a cross relation to it's host - in other words active paths should be one on SPA and one on SPB. Is this a limitation of the AX4-5i? Is there no best practice paper on iscsi setup with this unit and vmware 4.1?
Are those values added manually or automatically? I am not using any powerpat with VMware intergration. Is there no refernce from EMC regarding these values? The best I found is one called "EMC CLARiiON iSCSI Server Setup Guide for VMware ESX Server 3i and 3.x Hosts" (300-003-807) but it is old and very basic.
christopher_ime
4 Operator
•
2K Posts
0
January 20th, 2013 03:00
Thank you for providing more details about your environment.
As you perform your failover testing, for your specific environment (AX4, ESX/ESXi 4.x, and using the native multipathing host solution), keep in mind that there are certain back-end conditions that may not prompt the host PSP to initiate a trespass of the LUN, and you would therefore have to manually do so. This is documented in the following KB article:
emc253491: "Loss of backend devices on a CLARiiON AX4 Series array could result in ESX 4.x not failing over properly."
I am simply bringing it to your attention.
When purchased with 2 SP's as you own (and with the "dualsp" Navisphere Express enabler loaded), the AX4 is an active-passive array in that for each LUN, only a single SP services/owns that LUN at any time. Only as necessary, the LUN will "trespass" to the peer SP which then takes over ownership when, for instance, all the paths to the owning SP are dead or the owning SP itself goes down. For the AX4 which is not ALUA compliant, the only host registration supported for ESX/ESXi is what is known as "failover mode" of 1 (aka Passive-Not-Ready) and therefore, for a certain LUN only the paths to the owning SP are advertised to the ESX/ESXi server as "Active" (and can therefore be candidates for I/O) while the paths to the peer are seen in the vSphere Client as "Standby" as you've already noted.
This is how the ESX/ESXi Path Selection Policy MRU works; it will only use a single active path for I/O. Therefore, your observations of "Active (I/O)" for just a single path despite having multiple Active paths would be the same behavior regardless of the array.
As you probably realize now based on the comments above, the two Active paths are those associated with the SP that is the current owner for that LUN.
Begin by generating a custom document at the following site:
http://www.emc.com/microsites/clariion-support/ax45-support.htm
Click on "Install" at the top and then build your document.
Also, complement this with the "Host Connectivity Guide for VMWare ESX Server" found on support.emc.com.
Finally, sorry, but can you explain further about what you are asking about with the question: "Are those values added manually or automatically?"
Message was edited by: Christopher Imes Asked for more clarification regarding "Are those values added manually or automatically?"
kelleg
4 Operator
•
4.5K Posts
0
January 21st, 2013 15:00
Another document that might be helpful is the "EMC Host Connectivity GUide VMware ESX Server" - this is from Oct. 2012 and should have the most current data
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-002-304.pdf
Also, we recommend that your AX4 be upgraded to the latest array Flare release - patch 711.
glen