Unsolved
This post is more than 5 years old
1 Rookie
•
5 Posts
0
11746
July 7th, 2018 02:00
S4048-ON VLT and LACP issue while reloading VLT peer
Plain setup with a couple of S4048-ON switches in VLT setup: SA, SB
From these two switches we have connected a set of LACP port channels towards upstream C6800 (VSS), ToR switches and F5 BIG-IPs.
Now, our issues start when we want to upgrade the S4048-ON pair.
While reloading one VLT peer, we get massive packet drops 50% on the LACP port channels.
The connected LACP devices don't register that the links to the reloading VLT peer are down, and it takes about 1-2 minutes until it converges.
As I am not that familiar with DNOS, how can it be that the S4048 don't shut down interfaces at once while reloading?
As a workaround, we have tried to manually shut down the interfaces on the VLT peer that we want to reload, and that as expected works like a charm with convergence <1s.
interface Port-channel 51 description Po51 - ny LACP til cisco no ip address switchport no spanning-tree pvst err-dis cause invalid-pvst-bpdu lacp fast-switchover vlt-peer-lag port-channel 51 no shutdown interface TenGigabitEthernet 1/47 description Po51 - cisco lacp no ip address mtu 9216 ! port-channel-protocol LACP port-channel 51 mode active interface TenGigabitEthernet 1/48 description Po51 - cisco lacp no ip address mtu 9216 ! port-channel-protocol LACP port-channel 51 mode active
S4048-ON-2-01#sh vlt br
VLT Domain Brief
------------------
Domain ID: 2
Role: Primary
Role Priority: 1
ICL Link Status: Up
HeartBeat Status: Up
VLT Peer Status: Up
Local Unit Id: 0
Version: 6(8)
Local System MAC address: f4:8e:38:35:1c:08
Remote System MAC address: f4:8e:38:35:1d:08
Remote system version: 6(8)
Delay-Restore timer: 4 seconds
Delay-Restore Abort Threshold: 60 seconds
Peer-Routing : Disabled
Peer-Routing-Timeout timer: 0 seconds
Multicast peer-routing timeout: 150 seconds



HenrikS.
1 Rookie
•
5 Posts
0
July 10th, 2018 03:00
1: We are now running: Version 9.13(0.3P1), but we have seen this behavior on earlier versions aswell.
2: We use PVST, as should be supported?
3: We have not defined a specific VLT system-mac.
genoscope2
1 Rookie
•
13 Posts
0
November 16th, 2018 07:00
HenrikS.
1 Rookie
•
5 Posts
0
December 20th, 2018 00:00
janwillem.molenaar
1 Message
0
October 29th, 2020 08:00
Hi Hendrik,
We are experiencing the same issue with OS10 switches (10.5.2.0).
Did you manage to fix this issue?
We are about to open a SR at dell support.
Regards, Jan-Willem Molenaar
HenrikS.
1 Rookie
•
5 Posts
0
October 29th, 2020 12:00
Hello,
No, we have not yet been able to confirm a potential fix.
The main reason is that the packet loss that we encountered with the issue is so severe that the impact on production systems just isn't worth the while
The issue does not happen if you just cut power or manually shut down lacp interfaces before reload...
On another note. We started to have other issues with one out of 4 pairs of these switches this fall. Shortly after opening a SR, we were told that we were hit by the Intel Clock bug (after checking all SNs, all 8! switches were affected), ref:
https://www.dell.com/support/article/no-no/qna44095/networking-clock-signal-q-a?lang=en
(Note: "A: No you do not need to proactively contact Dell to get replacement product. Dell will contact individual customers to coordinate for product replacement as material becomes available.")
We were never proactively contacted by Dell on this matter, and for all I know it might be some kind of root cause.
We have just replaced the knowingly failed units, and are planning replacement of 6 more, we might perhaps perform another reload test on a replaced and fully updated pair after this.
I advice you to open up a SR and also to ask explicitly for them to check regarding known issues with your units.
-Henrik
bealdrid2
1 Rookie
•
117 Posts
0
October 30th, 2020 13:00
Henrik,
Regarding the clock bug issue, can you share what kind of issues you noticed on your switches that were affected by this bug? So far I have one pair that are affected but have not shown any problems to date, and I'm just trying to get a feel for what kind of things to look out for.
HenrikS.
1 Rookie
•
5 Posts
0
October 30th, 2020 14:00
Well, as it affects the Intel Atom CPU, it's related to the control plane rather than the data plane.
For us, it was discovered during normal operations, adding a new vlan, tagging interfaces and then suddenly just after a 'write mem', the hostname in the CLI prompt suddenly was renamed back to the default name:
Dell-EMC-something #
After that, we checked the serial output and saw error messages referring to writing to nvram and opened a SR.
Since the KB article stated "Once encountered it is likely that the unit will not boot, and will not be recoverable. Typically the system or card will stop functioning and will hang or reboot continuously. The issue may not be observed until a reboot or power cycle occurs." we did no more tests on the units until replacement, as the dataplane was still operational at the time.
-Henrik
bealdrid2
1 Rookie
•
117 Posts
0
October 30th, 2020 16:00
Henrik,
Thanks for that information. Our units are right about 4 years old now. We have been contacted regarding pro-active replacement, but due to supply chain issues we have not received new units yet. Hoping we can hold out until we do.
stefangs
2 Posts
0
February 18th, 2021 00:00
Hello all, Is there already a fix for this issue? Is there an active SR now for open at Dell? Kind regards, Stefan Geutjes
DELL-Josh Cr
Moderator
•
9.4K Posts
0
February 18th, 2021 10:00
Hi Stefan,
Is it also an S4048-ON that you are having issues with? What version of the OS are you running?
stefangs
2 Posts
0
February 19th, 2021 01:00
Hello Josh,
We experience the same issue with this version.
OS Version: 10.5.2.3
Build Version: 10.5.2.3.304
System Type: S4148F-ON
Architecture: x86_64
I hope you can bring the solution.
Kind regards,
Stefan Geutje
DELL-Josh Cr
Moderator
•
9.4K Posts
0
February 19th, 2021 09:00
The best option would be to call phone support and have them look at the configuration live.
Tomas Kalabis
1 Rookie
•
15 Posts
0
March 9th, 2021 03:00
hi, guys.
same isuue
S4112 with OS10 - Software version : 10.5.1.2
this is my post about it (with logs)
http://tomaskalabis.com/wordpress/dell-emc-s4112-on-vlt-and-lacp-issue-while-reloading-vlt-peer/
Tomas.
DELL-Stefan R
Moderator
•
790 Posts
0
March 9th, 2021 05:00
Hi Tomas.
Thanks for the info.
According to the last post from Josh, I also recommend calling the phone support to have them look into the configuration live.