This post is more than 5 years old
470 Posts
1
8051
lun tresspass and round robin
what would be the effect on the esx host and vms when tresspassing a lun on a cx4 when the multipath policy is set to round robin?
how would it differ from if the policy was set at mru or fixed?
thanks
christopher_ime
2K Posts
1
July 3rd, 2012 19:00
The answer is in how you have your HBA's zoned, and the behavior you reported actually validates it was properly architected per best practice.
With the setting now reverted back to the preferred setting (which is also the default), the 4x "Active (I/O)" paths are the optimized paths which correspond to those that are direct paths (via the switch of course) to the current SP owner of the LUN. The other 4x Active (but without I/O) paths are the non-optimized paths or rather those that have a path to the current SP owner but via its peer SP and redirected as necessary through the CMI channel.
Since each HBA have paths to both SPA and SPB per best practice (again the reported behavior validates this), with Round-Robin configured and alternating I/O (1000 by default but never at the same time) down each optimized path, that is why you are seeing activity on each HBA. If you were to then disable all paths to the current LUN owner, with ALUA configured, you should experience the following:
1) The current Active (I/O) paths should update with status "Disabled" or "Dead" depending on how you are testing or simulating path failures
2) The I/O should transfer to the other 4 Active paths
3) The LUN should not trespass which suggests that the I/O is being redirected by the upper-director via the CMI interface
4) After 128,000 I/O via the non-optimized paths, the system will make the decision to trespass the LUN (implicit trespass) instead of maintaining the non-optimal paths of an extra hop.
clintonskitson
116 Posts
1
July 1st, 2012 21:00
I am going to assume that Failover Mode 4 is being used which enabled ALUA based LUN access and an Active/Active configuration of the storage processors. I am also assuming 4 paths per LUN/datastore from each ESX host (2 HBAs, 4 array ports).
Let's assume I have VMs on a LUN that is owned by SPA. All paths will show as Active and the two going to SPA will show Optimized, while the other two going to SPB will be Non-Optimized. If I issue a trespass command to the LUN, then then the optimized paths will then be attached to the SPB paths and all IOs will process down these paths.
With MRU, a trespass of a LUN will change the Optmized/Non-Optimized setting per path and will cause IO to change to the first available Optmized path. This is very similar to RR, except in MRU there is not balancing between Optimized paths.
With FIXED, there is a preferred path that is set when the ESX host first talks to the LUN (or manually set), let's say to SPA (the currnet owner). If I trespass the LUN, the ESX host will continue to send IO to the Non-Optimized path and the storage processor will then send the IO to SPA through internal CMI bus. There is then an algorithm which will force the trespass of the LUN back to SPA if IOs continue to come in on the Non-Optimized path.
tdubb123
470 Posts
0
July 2nd, 2012 13:00
i followed this blog
http://www.boche.net/blog/index.php/2010/02/04/configure-vmware-esxi-round-robin-on-emc-storage/
so its better to set useano to 0?
tdubb123
470 Posts
0
July 2nd, 2012 13:00
hello
where do you see the path showing "optimized" ? in vmware, it shows all 8 paths as active (I/O)
I have failover mode 4 enabled as well as useANO=1
mattcowger1
61 Posts
0
July 2nd, 2012 13:00
Because you have useANO enabled, all paths which are 'active' (every path will be marked as active in an ALUA config like yours), everypath will be sent IO (thats what useANO does) even those that are non optimized.
If you were to disable ANO, you'd see what Clint is talking about.
Why do you have that enabled - its generally not a best practice....
mattcowger1
61 Posts
0
July 2nd, 2012 13:00
Jason's post is from 2010, and relates to Clariion/Celerra.
Its generally NOT a good idea to have this set - it can cause excessive useage of the CMI and increase latency.
christopher_ime
2K Posts
1
July 2nd, 2012 22:00
I'd like to direct you to a decent comprehensive post regarding ESX integration and block storage from a CX/VNX at the following location:
https://community.emc.com/message/598672#598672
It relates specifically to iSCSI but even though you didn't mention the transport medium, it has many portions that are relevant to both.
tdubb123
470 Posts
0
July 3rd, 2012 15:00
hi
so I change useano to 0. now in vsphere the paths are 8 active but only 4 shows (I/O)
but in esxtop, both hbas still shows traffic active at same time.
is this expected behavior? I thought only 1 hba should be active?
christopher_ime
2K Posts
0
July 3rd, 2012 21:00
Do you have multiple LUNs? Remember, the PSP is configured per LUN.
tdubb123
470 Posts
0
July 4th, 2012 11:00
yes multiple luns. i have changed useano on all of them to 0.
christopher_ime
2K Posts
0
July 4th, 2012 13:00
That will explain why you are seeing activity on each of your HBA's as each have their own LUN queue and I/O managed separately.
tdubb123
470 Posts
0
July 6th, 2012 08:00
is there a digaram for the clariion that will show the 8 paths to both sps?
tdubb123
470 Posts
0
July 7th, 2012 15:00
so with rr enabled and failovermode 4 and useano=0
should i expect any traffic on the non owner SP?
naviseccli -h x.x.x.x getlun 54
only shows SPA traffic and no SPB
---------------------------------------------------------
Prefetch size (blocks) = 0
Prefetch multiplier = 4
Segment size (blocks) = 0
Segment multiplier = 4
Maximum prefetch (blocks) = 4096
Prefetch Disable Size (blocks) = 4097
Prefetch idle count = 40
Variable length prefetching YES
Prefetched data retained YES
Read cache configured according to
specified parameters.
Total Hard Errors: 0
Total Soft Errors: 0
Total Queue Length: 3745610040
Name LUN 54
Minimum latency reads N/A
Read Histogram[0] 290451
Read Histogram[1] 61478
Read Histogram[2] 168788
Read Histogram[3] 2389310
Read Histogram[4] 7899389
Read Histogram[5] 431628
Read Histogram[6] 2622510
Read Histogram[7] 1229403
Read Histogram[8] 28183
Read Histogram[9] 7974
Read Histogram overflows 16598
Write Histogram[0] 72533563
Write Histogram[1] 9007154
Write Histogram[2] 4271866
Write Histogram[3] 733678145
Write Histogram[4] 9298012
Write Histogram[5] 3019329
Write Histogram[6] 634647
Write Histogram[7] 4998888
Write Histogram[8] 202331
Write Histogram[9] 84174
Write Histogram overflows 128670
Read Requests: 15008985
Write Requests: 835166521
Blocks read: 530453223
Blocks written: 2903275122
Read cache hits: 10930559
Read cache misses: N/A
Prefetched blocks: 453863488
Unused prefetched blocks: 29191349
Write cache hits: 821884244
Forced flushes: 75202
Read Hit Ratio: N/A
Write Hit Ratio: N/A
RAID Type: RAID1/0
RAIDGroup ID: 25
State: Bound
Stripe Crossing: 2524122
Element Size: 128
Current owner: SP A
Offset: 0
Auto-trespass: DISABLED
Auto-assign: DISABLED
Write cache: ENABLED
Read cache: ENABLED
Idle Threshold: 0
Idle Delay Time: 20
Write Aside Size: 2048
Default Owner: SP A
Rebuild Priority: High
Verify Priority: Medium
Prct Reads Forced Flushed: 0
Prct Writes Forced Flushed: 0
Prct Rebuilt: 100
Prct Bound: 100
LUN Capacity(Megabytes): 512000
LUN Capacity(Blocks): 1048576000
UID: 60:06:01:60:5A:91:28:00:62:84:AF:23:3F:A6:E1:11
Bus 0 Enclosure 7 Disk 13 Queue Length: 32979292
Bus 0 Enclosure 7 Disk 11 Queue Length: 36054202
Bus 0 Enclosure 7 Disk 9 Queue Length: 37404138
Bus 0 Enclosure 7 Disk 7 Queue Length: 40380190
Bus 0 Enclosure 7 Disk 12 Queue Length: 24831953
Bus 0 Enclosure 7 Disk 10 Queue Length: 27888966
Bus 0 Enclosure 7 Disk 8 Queue Length: 29799748
Bus 0 Enclosure 7 Disk 6 Queue Length: 31939528
Bus 0 Enclosure 7 Disk 13 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 11 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 9 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 7 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 12 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 10 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 8 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 6 Hard Read Errors: 0
Bus 0 Enclosure 7 Disk 13 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 11 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 9 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 7 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 12 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 10 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 8 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 6 Hard Write Errors: 0
Bus 0 Enclosure 7 Disk 13 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 11 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 9 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 7 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 12 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 10 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 8 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 6 Soft Read Errors: 0
Bus 0 Enclosure 7 Disk 13 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 11 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 9 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 7 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 12 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 10 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 8 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 6 Soft Write Errors: 0
Bus 0 Enclosure 7 Disk 13 Enabled
Reads: 9819903
Writes: 3785192
Blocks Read: 2616023789
Blocks Written: 696161897
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 96.24
Prct Busy: 3.75
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 11 Enabled
Reads: 9777037
Writes: 4068287
Blocks Read: 2616735191
Blocks Written: 695087069
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 96.21
Prct Busy: 3.78
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 9 Enabled
Reads: 9651814
Writes: 4178275
Blocks Read: 2609343868
Blocks Written: 694916327
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 96.26
Prct Busy: 3.73
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 7 Enabled
Reads: 9475994
Writes: 4315509
Blocks Read: 2608548756
Blocks Written: 694674816
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 96.24
Prct Busy: 3.75
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 12 Enabled
Reads: 4396828
Writes: 3785192
Blocks Read: 2351184628
Blocks Written: 696161897
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 97.07
Prct Busy: 2.92
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 10 Enabled
Reads: 4379956
Writes: 4068287
Blocks Read: 2354732909
Blocks Written: 695087069
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 97.03
Prct Busy: 2.96
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 8 Enabled
Reads: 4407318
Writes: 4178275
Blocks Read: 2353188774
Blocks Written: 694916327
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 97.03
Prct Busy: 2.96
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Bus 0 Enclosure 7 Disk 6 Enabled
Reads: 4318452
Writes: 4315509
Blocks Read: 2353607569
Blocks Written: 694674816
Queue Max: N/A
Queue Avg: N/A
Avg Service Time: N/A
Prct Idle: 96.99
Prct Busy: 3.00
Remapped Sectors: N/A
Read Retries: N/A
Write Retries: N/A
Is Private: NO
Snapshots List: Not Available
MirrorView Name if any: Not Available
tdubb123
470 Posts
0
July 7th, 2012 16:00
on page 11
http://www.emc.com/collateral/hardware/white-papers/h2890-emc-clariion-asymm-active-wp.pdf
it does mention that there should be traffic on both SPs. but I am only seeing SPA.
why is that?
statistics logging is enabled
christopher_ime
2K Posts
0
July 7th, 2012 22:00
Please check the current owner for each of the LUNs assigned to the ESX/ESXi host you are monitoring. It is possible that they are all owned by the same SP either because they were assigned that way or over time those assigned to the peer SP trespassed to the SP and without a mechanism to failback to the default owner, this may explain what you are reporting. Remember with Round-Robin, you have to manage trespassed LUNs either due to an explicit or implicit trespass.