Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1849

March 28th, 2016 09:00

IO port imbalance

Hello,

I need to correct an IO port imbalance for one of  our storage arrays. Is this done through zoning only, or by migrating luns from one SP to the other that isn't  getting as much traffic. I try to keep a good balance when zoning, however, I am perplexed by the following graph from M&R. Why would initiators be logged  into an IO port, and no traffic be reported?

Thanks in advance,

AdmingirlIO.PNG.png

March 30th, 2016 14:00

Good information from Glen and Zaphod already. I'd look at the pathing software and configuration. That's what's ultimately going to decide what ports receive the traffic.

From what I can see in the screenshot we're only looking at one SP so trespassing won't affect these numbers. Pathing is probably what is ruling this.

One thing of clarification I'd add. That last column where you seem to be having problems is not throughput or bandwidth. That label says "Queue Full/Busy." So it's not measuring whether or not you have traffic on those ports, it's measuring whether or not you are exceeding the maximum queue of SCSI commands on those ports. You very well could have some (probably lesser amounts) traffic on those ports, just not to the point of overloading them. You may want to look at this KBA below and check some of your HBA execution throttle numbers.

https://support.emc.com/kb/456863

4.5K Posts

March 29th, 2016 08:00

Not knowing the array configuration I'd guess that some of the ports are the standby ports in case the active port failed. Or all the LUNs are on one SP. There could be any number of reasons for this.

Have you looked at the NAR data from the array to see if there is IO on all the ports>

glen

195 Posts

March 30th, 2016 07:00

In my most general case:

> Each host will have two FC connections, each going to a separate FC fabric.

> Each connection will be zoned for a port on SPA, and a port on SPB.

> If I am using more than two FC ports on each SP, I will make an attempt to balance the zoning so that roughly the same number of hosts are zoned to each port in use.

> Every host will login, and become registered, on every port that it is zoned for.

Thus, each host will have four paths to each LUN, and in most cases for VNX it will have two active and two standby paths.  For VNX2 with RAID group based LUNs all four paths may be active (due to the MCx Symmetric Active Active feature).

I will make sure that the LUNs in the array are split fairly evenly between the two controllers, but that does not guarantee that the IO activity for those LUNs will also be split evenly. 

In all cases the MPIO policy at the host dictates which/how many paths get used.  If (using ESX as an example) the policy is MRU, or fixed, then only one path will be used by that host.  If RR is selected, then the two, or four active paths should all be used; this is my preferred policy.

It looks like your zoning is, at least a little, asymmetric.  As I read it (and please correct me if this is wrong) you have 89 logged in and registered connections to each SP, but they aren't evenly distributed.  I would suspect that is due to how they are zoned.  And there may be good reasons, but I'd look into it.

I would also check your LUNs to insure that you don't have the most heavily used LUNs all bunched up on one controller, and I would have you (or your server admin(s)) check to insure that the proper MPIO policy is in use for all attached hosts.

4.5K Posts

March 30th, 2016 07:00

As a follow up, on all the VNX arrays the supported failover mode for the hosts is 4 (ALUA) - for Pool LUNs this is the active/passive mode on the array. For VNX2 failover mode 4 setting on the host will use active/active on the array if you're using raid groups instead of Pools.

For ESX 5.x and higher the only supported failover mode on the host is Round Robin with failover mode 4. As an aside, you can change the "IO Operations Limit" for Round Robin from the default setting of 1000 to 1 can get better performance (see the attached White Paper).

Additionally if you have San Copy installed (the enabler installed), you must use only single initiator/single target zoning.

glen

1 Attachment

No Events found!

Top