Unsolved
This post is more than 5 years old
21 Posts
0
4504
Dell P6224 STP issues
For some reason, our switches decide to change ports from forwarding state to blocking state.
Without clear reason, one of the switches seem to elect a new STP root, as the switch itself. Situation is causing the ports the router is connected too, to be blocked, causing all network traffic to fail.
Switches are PC 6224 and 6224P and related, running Software Version 3.3.7.3 and 3.3.17.1.
Any ideas where to start, to get this sorted out???
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
May 14th, 2018 06:00
Are there more than one connection from the router to the switch? Even if there are topology changes, a port is only going to stay blocking if there is a loop.
As devices are added and removed from the network, some topology changes on the network can be normal. To help control the behavior of each switch, I suggest manually configure the spanning tree priority.
If you set the priority to 4096, this will force the switch to be the root switch as long as there are no other switches with this priority. I suggest assigning this priority to the switch that connects to the router.
# spanning tree priority 4096
Sometimes topology changes can be caused by rogue devices being plugged into the network. Is this a possibility?
WallieNgti
21 Posts
0
May 15th, 2018 03:00
There are 4 physical connections from the router to the switch, which are all in one channel-group.
Original the priority of the first switch was 8192, which was the lowest of all the switches.
Issue happened again today, changed for now the priority of the first switch to the suggested 4096, without success.
The other switches all have higher priority, like 16384, 20480, 24576, 28672 and 36864.
On some moments, connection between switches are being closed (1/g19 is the port sw-5 is connected to, from sw-0) on some moments one of the connections with the router are being closed, causing the whole network to breakdown.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
May 15th, 2018 06:00
Sounds like you have a firm control on spanning tree priority. Have you checked the switch logs to see what messages are recorded around the time of the issue? What model router is this? What model are the other switches?
One thing that can help with possible rogue devices is to ensure all of the edge ports are set to portfast, and then enabled spanning tree BPDU filtering. This would make it so that if a BPDU is received on an edge port, that BPDU would be discarded.
console#spanning-tree portfast bpdufilter default
You can this command to help look at which interfaces have received BPDUs.
console#show interfaces detail Ethernet
Feel free to post up logs, running configs, etc. I can help look through them.
WallieNgti
21 Posts
0
May 18th, 2018 01:00
As router we are using openbsd, running on a Dell PowerEdge R430
Further details:
sw-0: stack of 2, 6224P, System Software Version 3.3.17.1
sw-1: stack of 4 6248, System Software Version 3.3.17.1
sw-2: stack of 2 6248P, System Software Version 3.3.17.1
sw-3: 6224, System Software Version 3.3.17.1
sw-4: 6224P, System Software Version 3.3.17.1
sw-5: 6224, System Software Version 3.3.17.1
sw-0 connects to the internet
All other switches connect to sw-0 directly
Connection between sw-0 and sw-4 are using sw-0 1/g20, 2/g20 and port-channel 20, sw-4 1/g23, 1/g24, port-channel 1
sw-0 => sw-4:
On sw-4 side:
Regular connection to external is using sw-0 1/g24:
Router is an openbsd machine, which is connected using 4 connections, sw-0 1/g1, 1/g2, 2/g1, 2/g2 and port-channel 1
At the moment two of the interfaces with the router are shutdown, as this seems to be reducing the issue for now.
On switch sw-0, the internet connections are connected to, as well as all the other switches, router (rt-0), authentication servers and wifi access points.
The configuration has been running for month without any major changes or issues, and recently no changes has been performed on the configuration, which could explain the current issue.
After the issue was noticed, all the switches have been upgraded to the latest software version, without resolving the problem.
Last time issue log messages:
WallieNgti
21 Posts
0
May 18th, 2018 01:00
As router we are using openbsd, running on a Dell PowerEdge R430
Further details:
sw-0: stack of 2, 6224P, System Software Version 3.3.17.1
sw-1: stack of 4 6248, System Software Version 3.3.17.1
sw-2: stack of 2 6248P, System Software Version 3.3.17.1
sw-3: 6224, System Software Version 3.3.17.1
sw-4: 6224P, System Software Version 3.3.17.1
sw-5: 6224, System Software Version 3.3.17.1
sw-0 connects to the internet
All other switches connect to sw-0 directly
Connection between sw-0 and sw-4 are using sw-0 1/g20, 2/g20 and port-channel 20, sw-4 1/g23, 1/g24, port-channel 1
sw-0 => sw-4:
On sw-4 side:
Regular connection to external is using sw-0 1/g24:
Router is an openbsd machine, which is connected using 4 connections, sw-0 1/g1, 1/g2, 2/g1, 2/g2 and port-channel 1
At the moment two of the interfaces with the router are shutdown, as this seems to be reducing the issue for now.
On switch sw-0, the internet connections are connected to, as well as all the other switches, router (rt-0), authentication servers and wifi access points.
The configuration has been running for month without any major changes or issues, and recently no changes has been performed on the configuration, which could explain the current issue.
After the issue was noticed, all the switches have been upgraded to the latest software version, without resolving the problem.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
May 18th, 2018 05:00
I don't see anything wrong with the configuration. Can you post up the output from the logs? Were you able to track down any ports receiving BPDUs that should not have?
WallieNgti
21 Posts
0
May 18th, 2018 06:00
On sw-0, following BPDU are received:
port 1/g17: BPDU: sent 4, received 3 (connection with sw-2)
port 1/g24: BPDU: sent 137679, received 0 (internet connection)
port 2/g5: BPDU: sent 137725, received 0 (connection with powerrail)
port 2/g6: BPDU: sent 137736, received 0 (Access point)
port 2/g12: BPDU: sent 137792, received 0 (camera device)
port 2/g17: BPDU: sent 3, received 0 (connection with sw-2)
port 2/g20: BPDU: sent 234, received 208 (connection with sw-4)
port 2/g24: BPDU: sent 137950, received 0 (backup internet connection)
port-channel 1: BPDU: sent 137512, received 0 (connection with router)
port-channel 3: BPDU: sent 137986, received 0 (connection with authentication server)
port-channel 17: BPDU: sent 138037, received 4 (connection with sw-2)
port-channel 19: BPDU: sent 138057, received 2 (connection with sw-3)
port-channel 20: BPDU: sent 93796, received 100 (connection with sw-4)
Seen that at the moment today the issue occurred, there was a high spike on traffic on port-channel 20, incoming on sw-0, from sw-4. Was shown as 100M bits per second. Not sure how accurate that measurement is in cacti.
Logs don't show that much, nothing before the issue, just at the moment the issue is there:
General spanning tree settings on the switches are set in the config as below: (can't post the full config, due to max character counts)
Appart from the priority count, all switches are configured regarding this the same.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
May 18th, 2018 06:00
I was hoping there would be more information in the logs. Sometimes in these cases you will see messages indicating which interface a topology change notification is received on. The breadcrumbs seem to be pointing to switch 4. The next step I suggest is to look at switch 4 logs and interface counters, same as you have done on switch 0. Look for any interfaces receiving BPDUs that should not be receiving them.
WallieNgti
21 Posts
0
May 28th, 2018 06:00
Took a bit of time the issue occurred again. This time I was able to disable communication with the failing switch, without shutting it down, resulting in having access to the logs on the switch still...
Noticed on my syslog server the following (filtered all the messages from all switches):
Before the new STP root election noting mentioned in here.
From sw-4, got following log messages:
Around 07:42, I did perform a shutdown on sw-0 1/g20, causing the rest of the network to keep working properly. In the logs on sw-4, nothing is shown around this time.
On sw-, following information regarding BPDU:
port 1/g1: BPDU: sent 30, received 0
port 1/g5: BPDU: sent 1492, received 0
port 1/g6: BPDU: sent 11219, received 0
port 1/g7: BPDU: sent 1379, received 0
port 1/g10: BPDU: sent 138582, received 0
port 1/g11: BPDU: sent 66, received 0
port 1/g17: BPDU: sent 4050, received 0
port-channel 1: BPDU: sent 66, received 425637 (connected with sw-0)
WallieNgti
21 Posts
0
May 30th, 2018 02:00
Took a bit of time the issue occurred again. This time I was able to disable communication with the failing switch, without shutting it down, resulting in having access to the logs on the switch still...
Noticed on my syslog server the following (filtered all the messages from all switches):
Before the new STP root election noting mentioned in here.
From sw-4, got following log messages:
Around 07:42, I did perform a shutdown on sw-0 1/g20, causing the rest of the network to keep working properly. In the logs on sw-4, nothing is shown around this time.
Regarding the BPDU, only received are found on port-channel 1:
BPDU: sent 103, received 472365
All the others all have zero received.
WallieNgti
21 Posts
0
May 30th, 2018 02:00
Took a bit of time the issue occurred again. This time I was able to disable communication with the failing switch, without shutting it down, resulting in having access to the logs on the switch still...
Noticed on my syslog server the following (filtered all the messages from all switches):
Before the new STP root election noting mentioned in here.
From sw-4, got following log messages:
Around 07:42,
I did perform a shutdown on sw-0 1/g20, causing the rest of the network to keep working properly. In the logs on sw-4, nothing is shown around this time.
Regarding the BPDU, only received are found on port-channel 1:
BPDU: sent 103, received 472365
All the others all have zero received.
WallieNgti
21 Posts
0
May 30th, 2018 03:00
One more try, after all the others seems to be marked as **bleep** and disappear into the black hole..
Happened again, all to be seen in the syslog server is a message that sw-4 elected a new STP root.
Afterh this (with a time stamp of 1 second earlier???) the message the portchannel transiioned from the forwarding state to the blocking state...
Before the new STP root election noting mentioned in here. Only thing shown in the logs of sw-4 is only the election of the new STP root, and related. Only seeing one message regarding ipstkArpFlush: sending delete failed
Regarding the BPDU, only received are found on port-channel 1:
BPDU: sent 103, received 472365
All the others all have zero received.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
May 30th, 2018 10:00
It looks like rootkeeps bouncing back and forth between two specific MACs. What devices do these MAC addresses belong to?
7000:a4ba:db7a:8208
1000:d067:e5a9:76ef
WallieNgti
21 Posts
0
May 31st, 2018 00:00
Daniel,
Those addresses are the CST Regional Root addresses mentioned on the spanning tree command on sw-0 and sw-4.
and for sw-4: