Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

3755

August 18th, 2016 23:00

broadcom/qlogic npar isoe and MPIO policy on Windows 2012R2

Hi there,

I've enabled ISOE on partition 0 in our FC630, broadcom/qlogic 10GbE card. After configured everything on MS iscsi initator then I've seen all the paths were active for a while (10-20min) then one of the paths is just disappeared and never reconnected.

I can see "logout request was received from the initiator" message on the EQL log but that is not very helpful. Interestingly enough, this was just happened to the LUN; I've seen no issue for vss-control. So I was checking out the option and I've noticed that if I change the MPIO from "least queue depth" to "fail over only" then I dont see any path disappearance.

As I'm not quite sure why the default option "least queue depth" is not working, I'd appreciate for any insight and details.

Thanks,

 

5 Practitioner

 • 

274.2K Posts

August 19th, 2016 19:00

Hello, 

 If you open a support case, they will likely tell you it's working as designed.  Since EQL FW 6.x our MPIO code negotiates connections as needed.  To insure proper performance and minimize connection count. 

  The event message you included confirms that.  The "requested logout" is coming from the HIT MPIO code.  The 6210 has vertical failover so with little traffic only one connection is needed. A cable or switch failure will not bring down your volume.   If the load goes up another connection will be established. 

 The settings in the registry that Michael mentioned will no longer override the negotiation code in the firmware and HIT.  

 There is a feature request to make the minimum two connections to better provide peace-of-mind to our customers over this behavior.  

  Since you are dealing with a single member, the greatest benefit to our MPIO code isn't being leveraged.   That requires multiple members in one pool.   With a single member, you won't notice any difference w/LQD over Round Robin.  

 Regards, 

Don 

August 19th, 2016 10:00

Are you using HIT/Microsoft MPIO DSM?

Are you using single-port 10 GB arrays? 

If so, connections can default to 1 per member, and this can be modified in the MPIO DSM config on the Windows host.

Also, are you approaching the group/pool limitations for iscsi connection count? If so, connections could potentially be automatically scaled down - to allow other potentially at-risk volumes to remain online.

27 Posts

August 19th, 2016 11:00

Hi,

Yes, HIT KIT version 4.9 is installed. 

There are 2 x10GbE -eth0&1 on 6210 and both are active. There is only 1 member in the group and couple of hosts (3 servers) are connecting to the array. 

MPIO connection settings are default on HIT KIT which are; max sessions per entire volume 6 , max sessions per volume slice 2

I'll go through the documents to see if I miss anything before simulating network/nic failure. I'll post my findings as well.

thanks,

August 19th, 2016 12:00

You should not use Teaming on the NICs that you are using for iSCSI MPIO.

Also - ensure that the Server's SAN interfaces are appropriately included in the MPIO DSM configuration.  On the same token, ensure that the non-SAN interfaces are excluded from the MPIO DSM configuration.  This can be managed via ASM/ME or the PowerShell Tools.

27 Posts

August 19th, 2016 13:00

Hi Michael,

Each CNA on the server is configured like 

  • partition 0 ; iscsi enabled, ethernet and fcoe disabled
  • partition 1; ethernet enabled, iscsi and fcoe disabled
  • partition 2 and 3 all disabled

So I dont use nic teaming for partiton 0 and as this partition is listed as storage hba can only be configured through BASP/Qlogic Control Suite. So I've just enabled jumbo frame (9600 as recommended) and assigned ip address from the storage network.

Partition 1 however is configured with Microsoft Teaming (2012r2 builtin) with dynamic + switch independent mode for LAN/Server ( non-storage) traffic. 

On the hit kit, I've already excluded all other subnet as you mentioned.

"ensure that the Server's SAN interfaces are appropriately included in the MPIO DSM configuration."  I'm not quite sure if you are asking this, storage subnet is already included on Dell ASM/HIT kit and both partition 0 interfaces are in the same subnet. Please let me know if you need more details.

Thanks,

 

August 19th, 2016 14:00

If after making those changes, you're still having issues, I'd recommend that you open a Support case.  Best of luck.

5 Practitioner

 • 

274.2K Posts

August 19th, 2016 19:00

Small footnote about not noticing a performance difference.. This is especially true since you are partitioning your NIC cards. For best iSCSI performance, the SAN should be on its own subnet, with dedicated switches and NICs.  

Right now your environment is small should it grow much larger over time, this could become more of an issue. 

 Don 

27 Posts

August 22nd, 2016 12:00

Hi Don,

" A cable or switch failure will not bring down your volume.   If the load goes up another connection will be established. " 

Thanks for details that you provided however I've noticed losing active path actually causes operating system lost disk/lun connection as the second path never reestablishes/reconnects.

The whole idea using NIC partition and enabling isoe was to make sure DCB is configured all the way through (storage-switches-server). Based on my research the easiest way to configure DCB on the operating system (windows 2012r2) is to let CNA deal with it through enabling isoe.

 Unfortunately I've been told;

  1. this is not very common setup even though is published in one of the best practice guide
  2. having 2 (isoe adapter) initiator adapter accessing to same group ip (discover portal on MS initiator) somehow is not working well with EQL MPIO. 

As you mentioned, EQL MPIO will not bring much difference in terms of performance so I'll try with MS native MPIO to see if both paths are active and survive during the network/switch outage.

Thanks,

5 Practitioner

 • 

274.2K Posts

August 22nd, 2016 14:00

Hello, 

 Re: DCB.  I think there's some confusion here.  DCB is not a suggested or best practice configuration. If you are in a DCB environment the array will properly operate.  It's more expensive, more complicated to implement but in very large scale environments, like a Data Center it offers priority handling of iSCSI traffic.  Potentially at the cost of all other traffic.  As there are only two classes, lossless and lossy.    These are among the reasons support does not often get calls where DCB is being used. 

 In smaller environments using dedicated NICs and switches for iSCSI means that iSCSI traffic is prioritized and the other switches can be configured to properly handle other traffic.  I.e. VoIP can have its own priority level set.  

 The Broadcom firmware should be updated to current version, along with the drivers.  

  I strongly suggest you open a support case.  They can review the configuration and possibly determine why failover isn't working properly.   This should work fine regardess of DCB, etc..  As it occurs at the physical layer level. 

 Regards,

Don 

No Events found!

Top