Highlighted
Masterzen
1 Copper

5324 LACP not working with firmware > 1.0.0.47?

Hi,

I have a Dell PoweConnect 5324 switch since about 3 years. Two servers are connected through two different LACP LAG (ie 802.3ad channel groups), one Apple Xserver running OSX 10.4 and one Linux server (debian Etch with a 2.6.26 kernel).

Today, I wanted to take benefits of the new (in firmware 2.0.X.Y) option to setup 802.3ad load balancing for channel groups, so I upgraded the switch to 2.0.1.3. Immediately  the switch couldn't establish the 802.3ad links with both servers.

In fact the switch reported no failure, but the ports wouldn't associate (ie sh lacp port-channel reported no peers), but both servers reported a successful association. Moreover the sh lacp ethernet reported that there was no synchronization. No settings either on the switch nor the servers could make the association works.

I tried also 2.0.0.40 with the same result.

So back to 1.0.0.47 (and a non-working switch when it encountered the newish configuration) and not load-balancing hash algorithm.

I guess that's the difference between a $3000 cisco switch and a dell rebranded $1000 switch πŸ™‚

If someone have the solution to my issue, please speak up.

Thanks,

Brice

0 Kudos
10 Replies
bh1633
3 Cadmium

Re: 5324 LACP not working with firmware > 1.0.0.47?

Did you update the bootcode when you updated the firmware?

0 Kudos
Masterzen
1 Copper

Re: 5324 LACP not working with firmware > 1.0.0.47?

Did you update the bootcode when you updated the firmware?

Yes, otherwise you get a non-bootable switch and you have to connect with a serial cable to clean the mess πŸ™‚

 

0 Kudos
bh1633
3 Cadmium

Re: 5324 LACP not working with firmware > 1.0.0.47?

Please post the config file the recent versions of firmware that does not work for you.

0 Kudos
Masterzen
1 Copper

Re: 5324 LACP not working with firmware > 1.0.0.47?

Hi,

Please post the config file the recent versions of firmware that does not work for you.

 

Here is all the information I collected during my tests this morning:

With the working firmware (1.0.0.47):

 

Working Configuration:

interface ethernet g17
channel-group 2 mode auto
exit
interface ethernet g18
channel-group 2 mode auto
exit
interface ethernet g21
channel-group 3 mode auto
exit
interface ethernet g22
channel-group 3 mode auto
exit
interface vlan 1
ip address 172.16.10.249 255.255.255.0
exit
ip default-gateway 172.16.10.1
logging buffered debugging
aaa authentication enable default line
aaa authentication login default line
ip ssh server
ip https server
clock source sntp
sntp unicast client enable
sntp server 172.16.10.180

 

LACP information on the switch:

 

console# sh lacp port-channel 2
Port-Channel ch2
Port Type Gigabit Ethernet
Attached Lag id:
Actor
System Priority:1
MAC Address: 00:13:72:4a:bf:d2
Admin Key: 26
Oper Key: 26
Partner
System Priority:65535
MAC Address: 00:30:48:70:43:48
Oper Key: 17

# one of the port part of port-channel 2:

console# sh lacp ethernet g17
g17 LACP parameters:
Actor
system priority: 1
system mac addr: 00:13:72:4a:bf:d2
port Admin key: 26
port Oper key: 26
port Oper number: 17
port Admin priority: 1
port Oper priority: 1
port Admin timeout: LONG
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
Partner
system priority: 65535
system mac addr: 00:30:48:70:43:48
port Admin key: 0
port Oper key: 17
port Oper number: 1
port Admin priority: 0
port Oper priority: 255
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: TRUE
distributing: TRUE
expired: FALSE
g17 LACP statistics:
LACP Pdus sent: 5609
LACP Pdus received: 5586
g17 LACP Protocol State:
LACP State Machines:
Receive FSM: Current State
Mux FSM: Collecting Distributing State
Periodic Tx FSM: Slow Periodic State
Control Variables:
BEGIN: FALSE
LACP_Enabled: TRUE
Ready_N: FALSE
Selected: SELECTED
Port_moved: FALSE
NNT: FALSE
Port_enabled: TRUE
Timer counters:
periodic tx timer: 6
current while timer: 65
wait while timer: 0

 

Still same config, working firmware, the linux server gives the following information

 

Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 17
Partner Key: 26
Partner Mac Address: 00:13:72:4a:bf:d2

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:70:43:48
Aggregator ID: 1

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:70:43:49
Aggregator ID: 1

 

Now the same information with firmaware 2.0.1.03 (non working LACP)

 

Configuration

 

interface range ethernet g(17-18)
channel-group 2 mode auto
exit
interface range ethernet g(21-22)
channel-group 3 mode auto
exit
interface range ethernet g(17-18,21-22)
lacp timeout short
exit
interface vlan 1
ip address 172.16.10.249 255.255.255.0
exit
ip default-gateway 172.16.10.1
logging buffered debugging
aaa authentication enable default line
aaa authentication login default line
ip ssh server
ip https server
clock source sntp
sntp unicast client enable
sntp server 172.16.10.180

 

Note: the switch when converting the config to version supported by the firmware transformed my port-channels in ON mode instead of AUTO. I manually changed the config to have LACP on. It also added a lacp timeout short for the 4 involved ports. I tried with the LONG timeout, but it doesn't change anything.

Switch LACP information

 

console# sh lacp port-channel 2
Port-Channel ch2
Port Type Gigabit Ethernet
Attached Lag id:
Actor
System Priority:1
MAC Address: 00:13:72:4a:bf:d2
Admin Key: 26
Oper Key: 26
Partner
System Priority:0
MAC Address: 00:00:00:00:00:00
Oper Key: 0
console# sh lacp ethernet g17
g17 LACP parameters:
Actor
system priority: 1
system mac addr: 00:13:72:4a:bf:d2
port Admin key: 26
port Oper key: 26
port Oper number: 17
port Admin priority: 1
port Oper priority: 1
port Admin timeout: SHORT
port Oper timeout: SHORT
LACP Activity: ACTIVE
Aggregation: INDIVIDUAL
synchronization: FALSE
collecting: FALSE
distributing: FALSE
expired: FALSE
Partner
system priority: 65535
system mac addr: 00:30:48:70:43:48
port Admin key: 0
port Oper key: 17
port Oper number: 1
port Admin priority: 0
port Oper priority: 255
port Oper timeout: LONG
LACP Activity: ACTIVE
Aggregation: AGGREGATABLE
synchronization: TRUE
collecting: FALSE
distributing: FALSE
expired: FALSE
g17 LACP statistics:
LACP Pdus sent: 2
LACP Pdus received: 56
g17 LACP Protocol State:
LACP State Machines:
Receive FSM: Current State
Mux FSM: Detached State
Periodic Tx FSM: Slow Periodic State
Control Variables:
BEGIN: FALSE
LACP_Enabled: TRUE
Ready_N: FALSE
Selected: UNSELECTED
Port_moved: FALSE
NNT: FALSE
Port_enabled: TRUE
Timer counters:
periodic tx timer: 4
current while timer: 2
wait while timer: 0

and on the linux side

 

Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 2
Actor Key: 17
Partner Key: 26
Partner Mac Address: 00:13:72:4a:bf:d2

Slave Interface: eth0
MII Status: up
Link Failure Count: 5
Permanent HW addr: 00:30:48:70:43:48
Aggregator ID: 2

Slave Interface: eth1
MII Status: up
Link Failure Count: 5
Permanent HW addr: 00:30:48:70:43:49
Aggregator ID: 2

 

So it seems the linux server seems to see the switch, but not the reverse

Any idea?

the switch has the following characteristics: (taken after a reload to 1.0.0.47 and a new fresh configuration)

 

*********************************************************************
*** Running SW Ver. 1.0.0.47 Date 02-Nov-2005 Time 09:53:18 ***
*********************************************************************

HW version is 00.00.02
Base Mac address is: 00:13:72:4a:bf:d2
Dram size is : 64M bytes
Dram first block size is : 40960K bytes
Dram first PTR is : 0x1800000
Flash size is: 16M
Loading running configuration.
Number of configuration items loaded: 0

Loading startup configuration.
Number of configuration items loaded: 0
Device configuration:
Prestera based system
Slot 1 - powerConnect 5324 HW Rev. 0.0
Tapi Version: v1.2.10
Core Version: v1.2.10

Thanks for any help!

0 Kudos
bh1633
3 Cadmium

Re: 5324 LACP not working with firmware > 1.0.0.47?

Odd.

I set this up (no with Linux, with another switch) and I was unable to reproduce.  There is one odd thing I notice in your data for the failing case on the 5324:

 Aggregation:           INDIVIDUAL

The mib says: "Returns a value of TRUE if more than 1 port is configured in the channel; otherwise, returns a value of FALSE. "

The config file obviously has 2 ports in this port-channel.  So I suspect the automatic configuration file conversion when going from 1.0.0.47 to 2.0.1.3

Can you load 2.0.1.3, delete the confi file, reboot, manually type in only the LACP config lines and see if it works with Linux?

0 Kudos
Masterzen
1 Copper

Re: 5324 LACP not working with firmware > 1.0.0.47?

Odd.

I set this up (no with Linux, with another switch) and I was unable to reproduce. 

is it with another Dell switch or another brand/model?

I suspect the switch doesn't conform exactly as the standard. I don't have a proof of that of course, I'd have to capture the 802.3ad PDU to see that, which I'll certainly do if that's possible to capture such packets from a non-active bond on linux (or individual members).

There is one odd thing I notice in your data for the failing case on the 5324:

 Aggregation:           INDIVIDUAL

The mib says: "Returns a value of TRUE if more than 1 port is configured in the channel; otherwise, returns a value of FALSE. "

The config file obviously has 2 ports in this port-channel.  So I suspect the automatic configuration file conversion when going from 1.0.0.47 to 2.0.1.3

Can you load 2.0.1.3, delete the confi file, reboot, manually type in only the LACP config lines and see if it works with Linux?

Yes, I'll try this tomorrow morning (the switch is used as a core switch for a small network so I can't interrupt traffic whenever I want).

The thing is, after conversion, the port-channel is in mode ON (no LACP), so I have to remove the individual ethernets from the port-channel and form again the port-channel. OK, that's not quite the same thing as a full erase.

I don't have support anymore on this switch, but is there a way I can submit a bug report to dell, since I don't think the engineers who designed the firmware browse this forum?

Thanks for your help.

 

0 Kudos
bh1633
3 Cadmium

Re: 5324 LACP not working with firmware > 1.0.0.47?

I did the test with against a newer model dell switch (5448). 

If you can capture and compare LACP BPDUs between the 1.0.0.47 and 2.0.1.3 firmware, that would be interesting data.

Since the config file conversion is giving odd results, I think deleting the config with 2.0.1.3 and then re-entering a minimal config will give us some good data.

Do you have a newer Dell switch you can try?  54xx or 35xx?

0 Kudos
Masterzen
1 Copper

Re: 5324 LACP not working with firmware > 1.0.0.47?

Hi,

I removed the configuration before upgrading as you suggested and indeed it fixed the issue!

Thank you very much for your help.

Now I have to convince the switch to load balance the traffic πŸ™‚

0 Kudos
bh1633
3 Cadmium

Re: 5324 LACP not working with firmware > 1.0.0.47?

Look at the "Configuring Load Balancing" section of this doc:

http://support.dell.com/support/edocs/network/pc5324/en/UG_Ad/UGAdd.pdf

For an indepth the default LAG hashing algorithm used by most switches (this is written for a blade server switch, but the concepts are simple enough to apply to a standalone switch.  The switch in this doc and the 5324 have same hashing options):

http://support.dell.com/support/edocs/network/LAG1855/LAGConsiderationv0.5.pdf

0 Kudos