Unsolved
1 Rookie
•
8 Posts
0
42
s4810 LACPDUs topsy-curvy behavior
Got two s4810 switches in a stack (OS 9.14), connected to a NAS with an Intel XL710-QDA2 NIC.
Dual 40Gbps DACs , each cable to each switch, in a dynamic (LACP) LAG.
I've been experiencing sporadic but recurring connection drops (every 4-90 hours).
Saw this article:
Not sure if what I see constitutes "ungrouping" (I see the Fo interfaces "exiting" the port-channel, it goes down, and 1 second later they return), but it sure seemed worth to try.
The thing is, I'm seeing the opposite behavior of what the article describes:
Without LACP long-timeout, the received PDU's have been slightly more than transmitted:
LACP PDU Marker PDU Unknown Illegal
Port Xmit Recv Xmit Recv Rx Rx
-----------------------------------------------------------------------------
Fo 0/56 9050571 9365818 0 0 0 0
Fo 1/56 9050563 9365811 0 0 0 0
And after enabling long-timeout:
interface Port-channel 56
no ip address
switchport
lacp long-timeout
no shutdown
!
…It appears that that the Dell switch is sending out way more PDUs than the NIC is returning!
LACP PDU Marker PDU Unknown Illegal
Port Xmit Recv Xmit Recv Rx Rx
-----------------------------------------------------------------------------
Fo 0/56 627 22 0 0 0 0
Fo 1/56 627 22 0 0 0 0
As if the long-timeout command tells the peer device to send less PDUs at the long interval - while the switch keeps shooting 'em out every 1 second. Which is the opposite of what this command should do.
It now comes up to 28.7 seconds per response… which is again just 1 second away from flapping the port-channel interface.
What gives?
DELL-Marco B
Moderator
Moderator
•
3.5K Posts
0
October 6th, 2023 11:44
Hello,
this kind of issue is hard to troubleshoot. At first which is the firmware version of the switch?
Regarding the timeout option, about the timeout, the longer the timeout, the more PDUs are sent
PDUs are transmitted at either a slow or fast transmission rate, depending upon the LACP timeout value. The timeout value is the amount of time that a LAG interface waits for a PDU from the remote system before bringing the LACP session down. The default timeout value is 1 second. You can configure the default timeout value to be 30 seconds. Invoking the longer timeout might prevent the LAG from flapping if the remote system is up but temporarily unable to transmit PDUs due to a system interruption.
MediaComposer
1 Rookie
1 Rookie
•
8 Posts
0
October 6th, 2023 20:06
Based on what you're saying, though, the longer the timeout, the less PDUs are sent, no? This is the issue or confusion at the heart of this.
If we swap the timeout from fast (1 second) to slow (30 seconds), and that makes PDUs transmit at a slow rate, then the switch will send 1 PDU every 30 seconds instead of 1 second, no?
If in fast mode the switch transmits a PDU every 1 second, and in slow mode it transmits a PDU every 30 seconds, how would more PDUs possibly be sent?
MediaComposer
1 Rookie
1 Rookie
•
8 Posts
0
October 6th, 2023 22:28
Here. Thank you!
https://drive.google.com/file/d/1-yaOmZWUjO_-LHZnWDci5rcd1Ufm4Pq5/view?usp=sharing
MediaComposer
1 Rookie
1 Rookie
•
8 Posts
0
October 10th, 2023 01:26
So we're back to the last observation, that setting the timeout to long, appears to not affect the PDUs being sent, but (asymmetrically) the ones being received, at a ratio of 1 to (30-1).
I reset the counters when I changed the timeout from short to long. It just keeps going at the same rate. Now it's at 332,337 Xmit, 11,474 Recv. Resetting just… resets it to show the same behavior.
Re, "firmware version", please clarify what you mean. The software version? It's as mentioned in the OP and the link, v9.14. Are you referring to another version parameter that does not appear in show tech-support?
Thank you