Start a Conversation

Unsolved

This post is more than 5 years old

64384

May 24th, 2013 08:00

2x PS6110XV + 2x Force10 S4820T + R620/vmware

Hello,

i have the following setup:

2x ps6110xv in 1 pool (raid10). connected to

2x s4820T stacked through 2x 40GB

2x 1gb LAG LACP to:

4x PC62XX which has 3 other equallogics connected

All equallogic are in the same group and the grouplead is one of the eql's on the PC62XX stack.

My test servers is an r620 with the latest mem driver, delayed ack/lro disabled.
The server is connected to the force10's through an BCM57810 (npar enabled).

I run a iometer test 4KB 100% random reads.


When i look with ESXTOP i will see that only one path is used. (same happens with vmware psp round robin).

Besides that with live view of sanhq i can see that everything is done on one of the ps6110's

But when i look into the groupmanager i can see that the volume is evenly distributed over the 2 members?

What am i doing wrong?

30 Posts

May 24th, 2013 08:00

oh and i see on the ports of my esx servers on the force10's alot of throttles...

30 Posts

May 24th, 2013 10:00

im using the software iscsi initiator.

I`m seeing 2 paths. (so vmnic1 to eql1 and vmnic2 to eql2)

they told me thats its ok cause its 10GB it wont have more connections (i would expect it to be 4)

30 Posts

May 24th, 2013 10:00

i dunno why but i also seem to be loosing connections to lun's

2013-05-24T16:40:33.779Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240edc3c80, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:40:43.437Z cpu4:8379)VSCSI: 2779: Retry 0 on handle 8196 still in progress after 62 seconds

2013-05-24T16:41:03.955Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240edc2980, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:41:39.124Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x4124003e20c0, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:41:43.552Z cpu4:8379)VSCSI: 2779: Retry 0 on handle 8196 still in progress after 122 seconds

2013-05-24T16:42:10.369Z cpu11:14323)WARNING: UserLinux: 1331: unsupported: (void)

2013-05-24T16:42:13.642Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240ee5db40, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:42:43.669Z cpu4:8379)VSCSI: 2779: Retry 0 on handle 8196 still in progress after 183 seconds

2013-05-24T16:42:48.832Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x4124026f1b80, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:43:18.997Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240da45b80, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:43:43.787Z cpu0:8379)VSCSI: 2779: Retry 0 on handle 8196 still in progress after 243 seconds

2013-05-24T16:43:49.183Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240efeb3c0, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:44:19.338Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x41240040c1c0, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL

2013-05-24T16:44:43.906Z cpu0:8379)VSCSI: 2779: Retry 0 on handle 8196 still in progress after 303 seconds

2013-05-24T16:44:49.483Z cpu16:8237)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x4124003f4740, 13889) to dev "naa.68b7b2bce60f90f7ce18d5368a058072" on path "vmhba40:C1:T3:L0" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2013-05-24T16:44:49.484Z cpu2:8878)WARNING: iscsi_vmk: iscsivmk_TaskMgmtAbortCommands: vmhba40:CH:1 T:3 L:0 : Abort task response indicates task with itt=0x575f3 has been completed on the target but the task response has not arrived

2013-05-24T16:44:49.484Z cpu24:8378)VSCSI: 2450: handle 8196(vscsi1:0):Completing reset (0 outstanding commands)

2013-05-24T16:44:49.586Z cpu16:8237)HBX: 255: Reclaimed heartbeat for volume 518cefa8-e743aadf-777c-d4ae529c8195 (ESX-SAS-10G-02): [Timeout] [HB state abcdef02 offset 3706880 gen 443 stampUS 4976043363 uuid 519f864f-95771a78-8124-d4ae529c8195 jrnl

2013-05-24T16:44:50.340Z cpu4:13888)NetPort: 1574: disabled port 0x2000008

2013-05-24T16:44:50.342Z cpu4:13888)VSCSI: 6340: handle 8196(vscsi1:0):Destroying Device for world 13889 (pendCom 0)

7 Technologist

 • 

729 Posts

May 24th, 2013 10:00

Can you identify if you using the software iSCSI initiator?

When you look at the volume's  connections tab (in the group manager GUI) how many connections do you see for this datastore?

-joe

7 Technologist

 • 

729 Posts

May 24th, 2013 13:00

First take a look at this link and double check your setup::

en.community.dell.com/.../3615.rapid-equallogic-configuration-portal-by-sis.aspx

(see both the esx and switch setup information).

Additionally check out Don’s post here about setting up the ESX hosts in addition to the items listed in the configuration portal link:

en.community.dell.com/.../20008239.aspx

(the one marked as the answer by DELL-Donald Wi)

Without seeing the Array diags, it’s very hard to see both sides of the problem.  If you feel that your setup is correct, and you went over the doc’s and links that I posted , please open a support case for this so we can take a closer look to see what is going on.

-joe

30 Posts

May 24th, 2013 13:00

yeah thats how i configured it. Don's post apart from disabling LRO and Delayed ack is more about the vmware psp round robin. I believe if iuse the mem drive i dont need to change the iops to 3...

opened the case 2 hours ago. They couldnt find it and have escalated it to the performance team in your building i guess:)

5 Practitioner

 • 

274.2K Posts

May 24th, 2013 14:00

Just to add something.  When u look at esxtop it shows you the number of paths (virtual controllers) for the SW iSCSI adapter.  It's the first number after the vmhba41, in this case 68.  That's the number of paths to all the devices it's connecting to.

vmhba41 -                      68  1024     0.64     0.00     0.64     0.00     0.00     0.86     0.01     0.88     0.00    0.00    0.00    0.00    0.00    0.86    0.01    0.88    0.00

5 Practitioner

 • 

274.2K Posts

May 24th, 2013 14:00

When you use ESXTOP, it shows you the HBA for the SW iSCSI adapter.  Not the paths.

8:01:31pm up 36 days  5:18, 313 worlds, 1 VMs, 4 vCPUs; CPU load average: 0.01, 0.01, 0.02

ADAPTR PATH                 NPTH AQLEN   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd

vmhba0 -                       1   127     0.40     0.00     0.40     0.00     0.00     7.32     0.01     7.32     0.00

vmhba1 -                       1     1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba33 -                       2     1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba34 -                       0  1024     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba35 -                       0  1024     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba36 -                      22  1024     8.23     0.00     4.42     0.00     0.02     1.40     0.01     1.41     0.00

vmhba37 -                      20  1024     0.40     0.00     0.00     0.00     0.00     0.33     0.00     0.34     0.00

vmhba38 -                       0  1024     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba39 -                       0  1024     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba40 -                      26  1024     4.42     0.00     4.42     0.00     0.03     1.48     0.01     1.49     0.00

vmhba41 -                       0  1024     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

vmhba42 -                       0     1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

So if you look at the iSCSI adapters you have you see there's only one HBA, vmhba41.

esxcli iscsi adapter list

Adapter  Driver     State    UID                                        Description

-------  ---------  -------  -----------------------------------------  ----------------------

vmhba38  bnx2i      unbound  iscsi.vmhba38                              Broadcom iSCSI Adapter

vmhba39  bnx2i      unbound  iscsi.vmhba39                              Broadcom iSCSI Adapter

** vmhba41  iscsi_vmk  online   iqn.1998-01.com.vmware:andromeda-740386a7  iSCSI Software Adapter

vmhba32  bnx2i      unbound  iscsi.vmhba32                              Broadcom iSCSI Adapter

vmhba33  bnx2i      unbound  iscsi.vmhba33                              Broadcom iSCSI Adapter

vmhba34  bnx2i      unbound  iscsi.vmhba34                              Broadcom iSCSI Adapter

vmhba35  bnx2i      unbound  iscsi.vmhba35                              Broadcom iSCSI Adapter

vmhba36  bnx2i      unbound  iscsi.vmhba36                              Broadcom iSCSI Adapter

vmhba37  bnx2i      unbound  iscsi.vmhba37                              Broadcom iSCSI Adapter

When you look at the devices themselves you see that VMHBA41 creates virtual controllers for each path.  VMHBA41:C0, C1, etc..   You'll see them under Working paths below.

# esxcli storage nmp device list

naa.6090a098703e204c76065583f0bb5914

  Device Display Name: DemoTool-Storage-Volume

  Storage Array Type: VMW_SATP_EQL

  Storage Array Type Device Config: SATP VMW_SATP_EQL does not support device configuration.

  Path Selection Policy: DELL_PSP_EQL_ROUTED

  Path Selection Policy Device Config: PSP DELL_PSP_EQL_ROUTED does not support device configuration.

  Path Selection Policy Device Custom Config:

  Working Paths: vmhba41:C3:T16:L0, vmhba41:C2:T16:L0, vmhba41:C1:T16:L0, vmhba41:C0:T16:L0

On the connections tab in the EQL GUI, you should see the data sent/received increases on all the connections from that server.  

30 Posts

May 24th, 2013 14:00

with esxtop then u and after that shift+p i can see the individual paths.....

5 Practitioner

 • 

274.2K Posts

May 24th, 2013 14:00

Ah..  I don't use that view, since in the past I've seen it not work correctly.  

I just ran it on my system and it shows correct.  My apologies.

Will you please send me the output from esxcli storage nmp device list.   Just need the output from one EQL volume, not the whole list.  They all should show the same pathing selection policy.

One thing with VMware Round Robin is the default IOs per path value is very high, 1000.  So that tends to put all the IO onto one path.   You can change that to our recommended value of 3 with a script you run on each node.  MEM should automatcally optimize the available paths.

30 Posts

May 25th, 2013 01:00

naa.68b7b2bce60f90f7ce18d5368a058072

  Device Display Name: EQLOGIC iSCSI Disk (naa.68b7b2bce60f90f7ce18d5368a058072)

  Storage Array Type: VMW_SATP_EQL

  Storage Array Type Device Config: SATP VMW_SATP_EQL does not support device configuration.

  Path Selection Policy: DELL_PSP_EQL_ROUTED

  Path Selection Policy Device Config: PSP DELL_PSP_EQL_ROUTED does not support device configuration.

  Path Selection Policy Device Custom Config:

  Working Paths: vmhba40:C1:T2:L0, vmhba40:C2:T2:L0

  Is Local SAS Device: false

  Is Boot USB Device: false

so that looks good.

30 Posts

May 25th, 2013 01:00

i think i made a mistake, my IO meter test file is only 4MB big.. so maybe its too small to get loadbalanced? tried a bigger file  (4GB)and after a while i can see it loadbalances correctly.

But i still dont understand why i loose connections to the luns sometimes for over 10 minutes...

30 Posts

May 25th, 2013 02:00

well forget what i just said, also with 4GB test file i see all traffic going to 1 member (so also 1 path is used)

5 Practitioner

 • 

274.2K Posts

May 28th, 2013 05:00

At this point I think it's best you reopen your case.  It seems like you are concerned about three issues.

1.) Connections.  Just to affirm, with MEM 1.1.x and FW 6.0.x the connection count is more directly negotiated between server and array.   The 6110 is a single 10GbE interface.  So having a bunch of extra connections isn't helpful.  The MEM/FW interaction will trim or not start connections that don't provide addidtional performance.  

2.) Volumes not being properly distributed.   (Data only going to one member)

3.)  This on is more concerning to me, connections being lost for over 10 minutes.

Support will need diags from the arrays, vm support from the ESX nodes and switch configuration to work through these issues.

Regards,

No Events found!

Top