We have two PowerConnect 6248 switches interconnected with a LAG of two 10GbE connections.
Should we enable the spanning tree protocol on the LAG? Pros and cons?
Should we also enable it on the ports underlying the LAG?
A LAG is just like a port. If STP is enabled on the switch, then it is enabled on the LAG by default.
For PowerConnect switches, when a port is in a LAG, its port specific configuration is not used. The configuration of the LAG is applied to the member ports.
Whether or not to enable STP on any port or switch depends on the requirements of your network. PROS: no loops, redundancy. CONS: possible loops
If the two ports LAG is the only interconnection between the two switches, is it safe to not enable STP?
We're going to use the two PowerConnect switches for connecting ESX servers and EqualLogic arrays: each server/array will have at least one connection to each switch for redundancy.
Would you still enable STP in such scenario?
What about the portfast option?
If there is no chance of a loop, you can disable STP.
Connecting multiple NICS from the servers and arrays to each of the switches will not create a loop since the NICS will not bridge traffic.
LACPs main benefit is to help prevent cabling mistakes. It adds complexity to your simple set up, so I would not bother.
I have STP enabled as by default on the two PowerConnect 6248 switches interconnected with the LAG (2x1GbE) and I hit an issue while testing the redundancy of the network.
After rebooting root STP switch, when it comes back online, all the ports on the other switch will stop working for a few seconds (e.g. pinging the other switch management IP address from a server directly connected to it won't get any replies for a few seconds, then it will get back working as normal).
Since they are two separate switches connected with a LAG, I didn't expect that rebooting one of them could also affect the other!
(BTW, if I reboot the other switch, the ports on switch with is root for STP won't suffer any interruption)
I'm including the console logs from the switches in question below: switch2 is the root STP switch that I reload and switch1 is the other switch which suffers the issue when the first switch comes back online, LAG 3 is the interconnection.
Is this behavior normal? Any way to prevent the issue (besides completely disabling STP)?
What I'm looking is a way to completely manage the two switches separately for a fully redundant network (of course each server and SAN array will have a connection to one switch and another connection to the other switch) and rebooting one shouldn't bring down connectivity on the ports of the other.
I did some research, but I'm still not sure if what I'm seeing is the expected behavior or if it's a incorrect configuration ...
And besides completely disabling STP, I still didn't find a way to prevent the issue.
Any help would be appreciated.
Yes, this is fully expected behaviour. Look at these lines from your log:
<189> MAY 26 21:59:29 192.168.10.34-1 TRAPMGR: traputil.c(906) 319 % Unit 1 Port 2 is transitioned from the Forwarding state to the Blocking state in instance 0
***PING REPLY STOPS AROUND HERE***
***PING REPLY STARTS WORKING AGAIN AROUND HERE***
<189> MAY 26 21:59:59 192.168.10.34-1 TRAPMGR: traputil.c(906) 345 % Unit 1 Port 2 is transitioned from the Learning state to the Forwarding state in instance 0
Topology change puts by default all ports into blocking state and STP needs 30 seconds to move a port into forwarding state when no BPDUs are received.
For your configuration, there's no need for spanning tree. But if you still want to use it, observe these rules:
- use RSTP for fast convergence
- configure all end-station/server ports with "spanning-tree portfast" to ensure that topology changes don't affect them
- ports that interconnect switches must not be configured with "spanning-tree portfast"
We'll also have to connect a Juniper firewall to each switch and the two Juniper firewalls will be interconnect between them with an ethernet cable for HA (using NSRP).
Is it still safe to completely disable spanning tree on the two switches for our setup?
Also, for some further testing, we connected an older PC5324 switche to the PC6248 switches using a LAG (2 x 1GbE ports) and we didn't experience any network disruption on it while rebooting the STP root PC6248 switch (while we still experienced the issue on the other PC6248 switch). The PC5324 appear to be using the default spanning tree settings. Any ideas why it doesn't seem to suffer the same issue of the newer switch?
After upgrading the two PowerConnect 6248 from firmware version 184.108.40.206 to 220.127.116.11, keeping the same config, it appears that there is an improvement with network downtime issue on the other switch when rebooting the STP root switch: the ping test from above loses about 7-8 replies, while it didn't get any reply for about 30 seconds with the previous firmware version.
We're still leaning towards completely disabling STP on the two switches.
Is that still safe even connecting the Juniper firewalls as outlined in my previous post?
Also, I still wonder why the older PowerConnect 5324 doesn't exhibit the same issue as the newer PowerConnect 6248 ...