Start a Conversation

Unsolved

M

1 Rookie

 • 

8 Posts

42

October 6th, 2023 01:58

s4810 LACPDUs topsy-curvy behavior

Got two s4810 switches in a stack (OS 9.14), connected to a NAS with an Intel XL710-QDA2 NIC.

Dual 40Gbps DACs , each cable to each switch, in a dynamic (LACP) LAG.

I've been experiencing sporadic but recurring connection drops (every 4-90 hours).

Saw this article:

https://www.dell.com/support/kbdoc/en-us/000142785/dell-emc-networking-force10-os9-resolving-lacp-port-ungrouping

Not sure if what I see constitutes "ungrouping" (I see the Fo interfaces "exiting" the port-channel, it goes down, and 1 second later they return), but it sure seemed worth to try.

The thing is, I'm seeing the opposite behavior of what the article describes:

Without LACP long-timeout, the received PDU's have been slightly more than transmitted:

                  LACP PDU              Marker PDU         Unknown  Illegal  
Port           Xmit       Recv       Xmit       Recv       Rx       Rx       
-----------------------------------------------------------------------------
Fo 0/56        9050571    9365818    0          0          0        0        
Fo 1/56        9050563    9365811    0          0          0        0        

And after enabling long-timeout:

interface Port-channel 56
 no ip address
 switchport
 lacp long-timeout 
 no shutdown
!

…It appears that that the Dell switch is sending out way more PDUs than the NIC is returning!

                  LACP PDU              Marker PDU         Unknown  Illegal  

Port           Xmit       Recv       Xmit       Recv       Rx       Rx       

-----------------------------------------------------------------------------

Fo 0/56        627        22         0          0          0        0        

Fo 1/56        627        22         0          0          0        0     

As if the long-timeout command tells the peer device to send less PDUs at the long interval - while the switch keeps shooting 'em out every 1 second. Which is the opposite of what this command should do.

It now comes up to 28.7 seconds per response… which is again just 1 second away from flapping the port-channel interface.

What gives?

Moderator

 • 

3.5K Posts

October 6th, 2023 11:44

Hello,

this kind of issue is hard to troubleshoot. At first which is the firmware version of the switch?

 

Regarding the timeout option, about the timeout, the longer the timeout, the more PDUs are sent

 

PDUs are transmitted at either a slow or fast transmission rate, depending upon the LACP timeout value. The timeout value is the amount of time that a LAG interface waits for a PDU from the remote system before bringing the LACP session down. The default timeout value is 1 second. You can configure the default timeout value to be 30 seconds. Invoking the longer timeout might prevent the LAG from flapping if the remote system is up but temporarily unable to transmit PDUs due to a system interruption.

 

 

1 Rookie

 • 

8 Posts

October 6th, 2023 20:06

Based on what you're saying, though, the longer the timeout, the less PDUs are sent, no? This is the issue or confusion at the heart of this.

If we swap the timeout from fast (1 second) to slow (30 seconds), and that makes PDUs transmit at a slow rate, then the switch will send 1 PDU every 30 seconds instead of 1 second, no?

If in fast mode the switch transmits a PDU every 1 second, and in slow mode it transmits a PDU every 30 seconds, how would more PDUs possibly be sent?

Moderator

 • 

8.7K Posts

06-10-2023 20:23 PM

MediaComposer,
 
We will need to do some additional research on the issue, if you could share the results of show tech-support it would help.
 
Thanks.
 
 
 

DELL-Chris H

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

8 Posts

October 6th, 2023 22:28

Moderator

 • 

3.5K Posts

09-10-2023 07:44 AM

Hello,

yes sorry the longer timeout the less PDU are sent.

What is the firmware version?
Did you tried the command clear lacp counters 

and then again show lacp port-channel-number counters

 

Thanks

DELL- Marco B

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

8 Posts

October 10th, 2023 01:26

So we're back to the last observation, that setting the timeout to long, appears  to not affect the PDUs being sent, but (asymmetrically) the ones being received, at a ratio of 1 to (30-1).

I reset the counters when I changed the timeout from short to long. It just keeps going at the same rate. Now it's at 332,337 Xmit, 11,474 Recv. Resetting just… resets it to show the same behavior.

Re, "firmware version", please clarify what you mean. The software version? It's as mentioned in the OP and the link, v9.14. Are you referring to another version parameter that does not appear in show tech-support?

Thank you

Moderator

 • 

3.2K Posts

10-10-2023 05:29 AM

Hi,

 

What is the NAS connected to the switch?

 

Can you remove lacp long-timeout from the port channel, to check if the issue persist.

DELL-Joey C

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

                                       Did I answer your query? Please click on ‘Mark as Accepted Answer’ if I did. 

No Events found!

Top