Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

8019

June 30th, 2012 12:00

lun tresspass and round robin

what would be the effect on the esx host and vms when tresspassing a lun on a cx4 when the multipath policy is set to round robin?

how would it differ from if the policy was set at mru or fixed?

thanks

July 3rd, 2012 19:00

The answer is in how you have your HBA's zoned, and the behavior you reported actually validates it was properly architected per best practice.

With the setting now reverted back to the preferred setting (which is also the default), the 4x "Active (I/O)" paths are the optimized paths which correspond to those that are direct paths (via the switch of course) to the current SP owner of the LUN. The other 4x Active (but without I/O) paths are the non-optimized paths or rather those that have a path to the current SP owner but via its peer SP and redirected as necessary through the CMI channel.

Since each HBA have paths to both SPA and SPB per best practice (again the reported behavior validates this), with Round-Robin configured and alternating I/O (1000 by default but never at the same time) down each optimized path, that is why you are seeing activity on each HBA.  If you were to then disable all paths to the current LUN owner, with ALUA configured, you should experience the following:

1) The current Active (I/O) paths should update with status "Disabled" or "Dead" depending on how you are testing or simulating path failures

2) The I/O should transfer to the other 4 Active paths

3) The LUN should not trespass which suggests that the I/O is being redirected by the upper-director via the CMI interface

4) After 128,000 I/O via the non-optimized paths, the system will make the decision to trespass the LUN (implicit trespass) instead of maintaining the non-optimal paths of an extra hop.

July 1st, 2012 21:00

I am going to assume that Failover Mode 4 is being used which enabled ALUA based LUN access and an Active/Active configuration of the storage processors.  I am also assuming 4 paths per LUN/datastore from each ESX host (2 HBAs, 4 array ports).

Let's assume I have VMs on a LUN that is owned by SPA.  All paths will show as Active and the two going to SPA will show Optimized, while the other two going to SPB will be Non-Optimized.  If I issue a trespass command to the LUN, then then the optimized paths will then be attached to the SPB paths and all IOs will process down these paths.

With MRU, a trespass of a LUN will change the Optmized/Non-Optimized setting per path and will cause IO to change to the first available Optmized path.  This is very similar to RR, except in MRU there is not balancing between Optimized paths.

With FIXED, there is a preferred path that is set when the ESX host first talks to the LUN (or manually set), let's say to SPA (the currnet owner).  If I trespass the LUN, the ESX host will continue to send IO to the Non-Optimized path and the storage processor will then send the IO to SPA through internal CMI bus.  There is then an algorithm which will force the trespass of the LUN back to SPA if IOs continue to come in on the Non-Optimized path.

470 Posts

July 2nd, 2012 13:00

470 Posts

July 2nd, 2012 13:00

hello

where do you see the path showing "optimized" ? in vmware, it shows all 8 paths as active (I/O)

I have failover mode 4 enabled as well as useANO=1

61 Posts

July 2nd, 2012 13:00

Because you have useANO enabled, all paths which are 'active' (every path will be marked as active in an ALUA config like yours), everypath will be sent IO (thats what useANO does) even those that are non optimized.

If you were to disable ANO, you'd see what Clint is talking about.

Why do you have that enabled - its generally not a best practice....

61 Posts

July 2nd, 2012 13:00

Jason's post is from 2010, and relates to Clariion/Celerra.

Its generally NOT a good idea to have this set - it can cause excessive useage of the CMI and increase latency.

July 2nd, 2012 22:00

I'd like to direct you to a decent comprehensive post regarding ESX integration and block storage from a CX/VNX at the following location:

https://community.emc.com/message/598672#598672

It relates specifically to iSCSI but even though you didn't mention the transport medium, it has many portions that are relevant to both.

470 Posts

July 3rd, 2012 15:00

hi

so I change useano to 0. now in vsphere the paths are 8 active but only 4 shows (I/O)

but in esxtop, both hbas still shows traffic active at same time.

is this expected behavior? I thought only 1 hba should be active?

July 3rd, 2012 21:00

tdubb wrote:

hi

so I change useano to 0. now in vsphere the paths are 8 active but only 4 shows (I/O)

but in esxtop, both hbas still shows traffic active at same time.

is this expected behavior? I thought only 1 hba should be active?

Do you have multiple LUNs?  Remember, the PSP is configured per LUN.

470 Posts

July 4th, 2012 11:00

yes multiple luns. i have changed useano on all of them to 0.

July 4th, 2012 13:00

That will explain why you are seeing activity on each of your HBA's as each have their own LUN queue and I/O managed separately.

470 Posts

July 6th, 2012 08:00

is there a digaram for the clariion that will show the 8 paths to both sps?

470 Posts

July 7th, 2012 15:00

so with rr enabled and failovermode 4 and useano=0

should i expect any traffic on the non owner SP?

naviseccli -h x.x.x.x getlun 54

only shows SPA traffic and no SPB

---------------------------------------------------------

Prefetch size (blocks) =         0

Prefetch multiplier =            4

Segment size (blocks) =          0

Segment multiplier =             4

Maximum prefetch (blocks) =      4096

Prefetch Disable Size (blocks) = 4097

Prefetch idle count =            40

Variable length prefetching YES

Prefetched data retained    YES

Read cache configured according to

specified parameters.

Total Hard Errors:          0

Total Soft Errors:          0

Total Queue Length:         3745610040

Name                        LUN 54

Minimum latency reads N/A

Read Histogram[0] 290451

Read Histogram[1] 61478

Read Histogram[2] 168788

Read Histogram[3] 2389310

Read Histogram[4] 7899389

Read Histogram[5] 431628

Read Histogram[6] 2622510

Read Histogram[7] 1229403

Read Histogram[8] 28183

Read Histogram[9] 7974

Read Histogram overflows 16598

Write Histogram[0] 72533563

Write Histogram[1] 9007154

Write Histogram[2] 4271866

Write Histogram[3] 733678145

Write Histogram[4] 9298012

Write Histogram[5] 3019329

Write Histogram[6] 634647

Write Histogram[7] 4998888

Write Histogram[8] 202331

Write Histogram[9] 84174

Write Histogram overflows 128670

Read Requests:              15008985

Write Requests:             835166521

Blocks read:                530453223

Blocks written:             2903275122

Read cache hits:            10930559

Read cache misses:          N/A

Prefetched blocks:          453863488

Unused prefetched blocks:   29191349

Write cache hits:           821884244

Forced flushes:             75202

Read Hit Ratio:             N/A

Write Hit Ratio:            N/A

RAID Type:                  RAID1/0

RAIDGroup ID:               25

State:                      Bound

Stripe Crossing:            2524122

Element Size:               128

Current owner:              SP A

Offset:                     0

Auto-trespass:              DISABLED

Auto-assign:                DISABLED

Write cache:                ENABLED

Read cache:                 ENABLED

Idle Threshold:             0

Idle Delay Time:            20

Write Aside Size:           2048

Default Owner:              SP A

Rebuild Priority:           High

Verify Priority:            Medium

Prct Reads Forced Flushed:  0

Prct Writes Forced Flushed: 0

Prct Rebuilt:               100

Prct Bound:                 100

LUN Capacity(Megabytes):    512000

LUN Capacity(Blocks):       1048576000

UID:                        60:06:01:60:5A:91:28:00:62:84:AF:23:3F:A6:E1:11

Bus 0 Enclosure 7  Disk 13  Queue Length:               32979292

Bus 0 Enclosure 7  Disk 11  Queue Length:               36054202

Bus 0 Enclosure 7  Disk 9  Queue Length:               37404138

Bus 0 Enclosure 7  Disk 7  Queue Length:               40380190

Bus 0 Enclosure 7  Disk 12  Queue Length:               24831953

Bus 0 Enclosure 7  Disk 10  Queue Length:               27888966

Bus 0 Enclosure 7  Disk 8  Queue Length:               29799748

Bus 0 Enclosure 7  Disk 6  Queue Length:               31939528

Bus 0 Enclosure 7  Disk 13  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 11  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 9  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 7  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 12  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 10  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 8  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 6  Hard Read Errors:           0

Bus 0 Enclosure 7  Disk 13  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 11  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 9  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 7  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 12  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 10  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 8  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 6  Hard Write Errors:          0

Bus 0 Enclosure 7  Disk 13  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 11  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 9  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 7  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 12  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 10  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 8  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 6  Soft Read Errors:           0

Bus 0 Enclosure 7  Disk 13  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 11  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 9  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 7  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 12  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 10  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 8  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 6  Soft Write Errors:          0

Bus 0 Enclosure 7  Disk 13   Enabled

Reads:            9819903

Writes:           3785192

Blocks Read:      2616023789

Blocks Written:   696161897

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        96.24

Prct Busy:        3.75

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 11   Enabled

Reads:            9777037

Writes:           4068287

Blocks Read:      2616735191

Blocks Written:   695087069

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        96.21

Prct Busy:        3.78

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 9   Enabled

Reads:            9651814

Writes:           4178275

Blocks Read:      2609343868

Blocks Written:   694916327

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        96.26

Prct Busy:        3.73

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 7   Enabled

Reads:            9475994

Writes:           4315509

Blocks Read:      2608548756

Blocks Written:   694674816

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        96.24

Prct Busy:        3.75

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 12   Enabled

Reads:            4396828

Writes:           3785192

Blocks Read:      2351184628

Blocks Written:   696161897

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        97.07

Prct Busy:        2.92

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 10   Enabled

Reads:            4379956

Writes:           4068287

Blocks Read:      2354732909

Blocks Written:   695087069

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        97.03

Prct Busy:        2.96

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 8   Enabled

Reads:            4407318

Writes:           4178275

Blocks Read:      2353188774

Blocks Written:   694916327

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        97.03

Prct Busy:        2.96

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Bus 0 Enclosure 7  Disk 6   Enabled

Reads:            4318452

Writes:           4315509

Blocks Read:      2353607569

Blocks Written:   694674816

Queue Max:        N/A

Queue Avg:        N/A

Avg Service Time: N/A

Prct Idle:        96.99

Prct Busy:        3.00

Remapped Sectors: N/A

Read Retries:     N/A

Write Retries:    N/A

Is Private:                 NO

Snapshots List:             Not Available

MirrorView Name if any:     Not Available

470 Posts

July 7th, 2012 16:00

on page 11

http://www.emc.com/collateral/hardware/white-papers/h2890-emc-clariion-asymm-active-wp.pdf

it does mention that there should be traffic on both SPs. but I am only seeing SPA.

why is that?

statistics logging is enabled

July 7th, 2012 22:00

Please check the current owner for each of the LUNs assigned to the ESX/ESXi host you are monitoring.  It is possible that they are all owned by the same SP either because they were assigned that way or over time those assigned to the peer SP trespassed to the SP and without a mechanism to failback to the default owner, this may explain what you are reporting.  Remember with Round-Robin, you have to manage trespassed LUNs either due to an explicit or implicit trespass.

No Events found!

Top