Start a Conversation

Unsolved

This post is more than 5 years old

3868

December 14th, 2016 01:00

Dell PowerConnect 6248 LAG Dropping Connection

We are currently having some issues with our Blade Server LAG connection. The error we receive is "0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0". This happens and then the connection will drop as the connections transition from forwarding to learning states. This happens a few times per day.

Connection Information:

  • We are not using Jumbo Frames for our connection to the blade servers
  • Portfast is not enabled.
  • Firmware is currently on 3.0.0.8 and the latest is 3.3.15.1
  • There is a high number of Broadcast, Multicast and Unicast Packets on the LAG Channel.

Do you have any suggestions for a fix?

Thanks in advance

5 Practitioner

 • 

274.2K Posts

December 14th, 2016 11:00

Can you provide us with some more information on the topology you are working with?

- What blade chassis is the server installed in?

- What model server?

- What model switch is installed in the blade chassis?

- Is this error being seen on the 6248 or the blade chassis switch?

- I port 0/3/1 the only interface with this message?

- What does port 0/3/1 connect to?

These transitions are typically caused by spanning tree topology changes or faulty cables/ports.

8 Posts

December 16th, 2016 06:00

Hey Daniel,

sorry for the late reply. To answer your questions:

- What blade chassis is the server installed in?

M1000e

- What model server?

PowerEdge M600

- What model switch is installed in the blade chassis?

Where would I find this out? In the Dell Chassis Management Controller interface? Each

- Is this error being seen on the 6248 or the blade chassis switch?

The error is being seen on the 6248 switch

- Is port 0/3/1 the only interface with this message?

This is the first port which flaps, fails (Link on 0/3/1 is failed), then a new STP root is selected and then the rest follow runnning in a loop of blocking and forwarding states. This happens about twice a day, but depends on the amount of traffic per day.

- What does port 0/3/1 connect to?

Port 0/3/1 is a 2 interface LAG CH1 group and connects to our Blade Servers. It has two interfaces in the LAG, but only one interface is actually connected to the Blade Server. I did not set this up, I have just arrived at it, and these are the issues I am having.

I hope this makes the situation more clear.

5 Practitioner

 • 

274.2K Posts

December 16th, 2016 07:00

The easiest method of identifying the switch will be to physically observer which I/O module the 6248 plugs into. Logging into the M1000e CMC will show you which modules are installed.

Are there any messages in the logs which refer to a topology change being received on a specific interface? Some topology change is normal on a network, as devices are added/removed. However if all switches are left at the default spanning tree values, then root selection is dependent upon MAC addresses. When this is the situation, a BPDU being received on an interface can cause a network wide spanning tree convergence.

Is the 6248 your core switch? A common practice is to take your core switch and assign it a spanning tree value of 4096. This value ensures that the core switch stays the root switch. Then any secondary core switch, like when using VRRP, would be set to 8192. Your distribution layer would be set to 16384, and access layer can be left at default values. This can help reduce the impact of topology change notification.

It may also be a good idea to rule out possible physical failures. Check the cable on 0/3/1, ensure it is good. If possible perhaps try another physical port on the 6248.

Firmware can also help with overall operability. It may not get rid of the issue at hand, but may help the switch handle the conditions smoother.

Keep us posted.

8 Posts

December 20th, 2016 01:00

Hi Daniel,

the modules installed are: Dell PowerConnect M6220

Process seems to be, all is well and then:

19888  Notice DEC 16 20:57:47 TRAPMGR 0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0
19889  Notice DEC 16 20:57:47 TRAPMGR Unit 1 elected as the new STP root
19890  Notice DEC 16 20:57:47 TRAPMGR Instance 0 has elected a new STP root: 8000:001e:c990:45b0
19891  Notice DEC 16 20:57:47 TRAPMGR Link on 0/3/1 is failed
19892  Notice DEC 16 20:57:47 TRAPMGR Instance 0 has elected a new STP root: 8000:001c:2eb2:d480
19893  Notice DEC 16 20:58:19 TRAPMGR 0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0
19894  Notice DEC 16 20:58:19 TRAPMGR Instance 0 has elected a new STP root: 4000:0021:9bae:6262
19895  Notice DEC 16 20:58:19 TRAPMGR 1/0/3 is transitioned from the Forwarding state to the Blocking state in instance 0
19896  Notice DEC 16 20:58:19 TRAPMGR 1/0/35 is transitioned from the Forwarding state to the Blocking state in instance 0
19897  Notice DEC 16 20:58:19 TRAPMGR 1/0/7 is transitioned from the Forwarding state to the Blocking state in instance 0
19898  Notice DEC 16 20:58:19 TRAPMGR 1/0/8 is transitioned from the Forwarding state to the Blocking state in instance 0
19899  Notice DEC 16 20:58:19 TRAPMGR 1/0/11 is transitioned from the Forwarding state to the Blocking state in instance 0
19900  Notice DEC 16 20:58:19 TRAPMGR 1/0/16 is transitioned from the Forwarding state to the Blocking state in instance 0
19901  Notice DEC 16 20:58:19 TRAPMGR 1/0/24 is transitioned from the Forwarding state to the Blocking state in instance 0
19902  Notice DEC 16 20:58:19 TRAPMGR 1/0/34 is transitioned from the Forwarding state to the Blocking state in instance 0
19903  Notice DEC 16 20:58:19 TRAPMGR 1/0/37 is transitioned from the Forwarding state to the Blocking state in instance 0
19904  Notice DEC 16 20:58:19 TRAPMGR 1/0/38 is transitioned from the Forwarding state to the Blocking state in instance 0
19905  Notice DEC 16 20:58:19 TRAPMGR 1/0/40 is transitioned from the Forwarding state to the Blocking state in instance 0
19906  Notice DEC 16 20:58:19 TRAPMGR 1/0/41 is transitioned from the Forwarding state to the Blocking state in instance 0
19907  Notice DEC 16 20:58:19 TRAPMGR 1/0/42 is transitioned from the Forwarding state to the Blocking state in instance 0
19908  Notice DEC 16 20:58:19 TRAPMGR 0/3/11 is transitioned from the Forwarding state to the Blocking state in instance 0
19909  Notice DEC 16 20:58:19 TRAPMGR 0/3/12 is transitioned from the Forwarding state to the Blocking state in instance 0
19910  Notice DEC 16 20:58:19 TRAPMGR 0/3/13 is transitioned from the Forwarding state to the Blocking state in instance 0
19911  Notice DEC 16 20:58:19 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19912  Notice DEC 16 20:58:19 TRAPMGR 0/3/1 is transitioned from the Learning state to the Forwarding state in instance 0
19913  Notice DEC 16 20:58:19 TRAPMGR 1/0/11 is transitioned from the Learning state to the Forwarding state in instance 0
19914  Notice DEC 16 20:58:23 TRAPMGR 1/0/3 is transitioned from the Learning state to the Forwarding state in instance 0
19915  Notice DEC 16 20:58:23 TRAPMGR 1/0/7 is transitioned from the Learning state to the Forwarding state in instance 0
19916  Notice DEC 16 20:58:23 TRAPMGR 1/0/8 is transitioned from the Learning state to the Forwarding state in instance 0
19917  Notice DEC 16 20:58:23 TRAPMGR 1/0/16 is transitioned from the Learning state to the Forwarding state in instance 0
19918  Notice DEC 16 20:58:23 TRAPMGR 1/0/24 is transitioned from the Learning state to the Forwarding state in instance 0
19919  Notice DEC 16 20:58:23 TRAPMGR 1/0/34 is transitioned from the Learning state to the Forwarding state in instance 0
19920  Notice DEC 16 20:58:23 TRAPMGR 1/0/35 is transitioned from the Learning state to the Forwarding state in instance 0
19921  Notice DEC 16 20:58:23 TRAPMGR 1/0/37 is transitioned from the Learning state to the Forwarding state in instance 0
19922  Notice DEC 16 20:58:23 TRAPMGR 1/0/38 is transitioned from the Learning state to the Forwarding state in instance 0
19923  Notice DEC 16 20:58:23 TRAPMGR 1/0/40 is transitioned from the Learning state to the Forwarding state in instance 0
19924  Notice DEC 16 20:58:23 TRAPMGR 1/0/41 is transitioned from the Learning state to the Forwarding state in instance 0
19925  Notice DEC 16 20:58:23 TRAPMGR 1/0/42 is transitioned from the Learning state to the Forwarding state in instance 0
19926  Notice DEC 16 20:58:23 TRAPMGR 0/3/11 is transitioned from the Learning state to the Forwarding state in instance 0
19927  Notice DEC 16 20:58:23 TRAPMGR 0/3/12 is transitioned from the Learning state to the Forwarding state in instance 0
19928  Notice DEC 16 20:58:23 TRAPMGR 0/3/13 is transitioned from the Learning state to the Forwarding state in instance 0
19929  Notice DEC 16 20:58:49 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19930  Notice DEC 16 20:58:53 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19931  Notice DEC 16 20:58:57 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19932  Notice DEC 16 20:59:01 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19933  Notice DEC 16 20:59:05 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19934  Notice DEC 16 20:59:09 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19935  Notice DEC 16 20:59:13 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19936  Notice DEC 16 20:59:17 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19937  Notice DEC 16 20:59:21 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1

then it goes back to being well until the next high load on the switch. (Roughly happens 2-3 times a day)

Yes, the 6248 is our core switch. I would like to change the spanning tree value as well.

I think I am on the lines of:

Replace the cable to be certain
Upgrade the firmware
Change the STP value
and
enable Port Fast on the LAG

Can I just ask, we have two ports configured in the LAG CH1 group (port 13 and port 15), but yet only one port is physically connected to the Blade server to transfer a lot of traffic from a Blade DB server, could this have anything to do with our issues?

Would reconnecting the second port help or not make a difference? Like I said, I did not configure this, just looking in to ways to resolve the STP loop and loss in packets on the Blade server.

Thanks in advance.

5 Practitioner

 • 

274.2K Posts

December 20th, 2016 06:00

Setting the STP priority so that the core switch stays the root should help out. With the spanning tree priority set to 4096, even if a topology change occurs, the root switch will not bounce around.

I advise against setting the switch to switch connection to portfast.

Link aggregation is designed to continue working when one link goes down. However if the intention is to only have one connection between these switches, It would clean up the config by taking out unnecessary lines. 

Have you checked the logs on the M6220 to see what messages are occurring around the same time as the above messages?

5 Practitioner

 • 

274.2K Posts

December 20th, 2016 07:00

Portfast will transition the interface straight into forwarding mode. The intention with portfast is that edge ports come up quicker. The concern with enabling portfast on switch to switch connections, is that loops will not be caught since the port goes straight into forwarding.

Cisco has some good documentation on portfast and topology changes.

http://bit.ly/1rdbj1B

http://bit.ly/1LiQTzK

Proposed action plan:

- Set core switch to priority 4096

- remove LAG configuration from both switch, and use just the single connection.

- Replace/Test cable that connects the two switches.

- monitor and record behavior

8 Posts

December 20th, 2016 07:00

Hi Daniel,

once again thanks for the response. Why would you advise against changing the connection to portfast?

OK, had a look at the logs on the blade M6220 switch and when our core switch is having its issues, this is happening on the M6220:

11219 Notice DEC 19 12:45:50 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11220 Notice DEC 19 12:45:50 TRAPMGR %% LAG- 1 is transitioned from the Forwarding state to the Blocking state in instance 0
11221 Notice DEC 19 12:45:50 TRAPMGR %% Link on LAG- 1 is failed
11222 Notice DEC 19 12:45:53 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11223 Notice DEC 19 12:45:53 TRAPMGR %% Link on LAG- 1 is failed
11224 Notice DEC 19 12:45:53 TRAPMGR %% Unit 1 Port 17 is transitioned from the Forwarding state to the Blocking state in instance 0
11225 Notice DEC 19 12:45:53 TRAPMGR %% Link Up: Unit 1 Port 17
11226 Notice DEC 19 12:46:23 TRAPMGR %% Spanning Tree Topology Change: 0, Unit: 1
11227 Notice DEC 19 12:46:23 TRAPMGR %% Unit 1 Port 17 is transitioned from the Learning state to the Forwarding state in instance 0
11228 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11229 Notice DEC 19 12:46:25 TRAPMGR %% Link on LAG- 1 is failed
11230 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is added to the trunk LAG- 1
11231 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is transitioned from the Forwarding state to the Blocking state in instance 0
11232 Notice DEC 19 12:46:25 TRAPMGR %% LAG- 1 is transitioned from the Forwarding state to the Blocking state in instance 0
11233 Notice DEC 19 12:46:55 TRAPMGR %% Spanning Tree Topology Change: 0, Unit: 1
11234 Notice DEC 19 12:46:55 TRAPMGR %% LAG- 1 is transitioned from the Learning state to the Forwarding state in instance 0

David

8 Posts

December 20th, 2016 08:00

Switch   Spanning Tree Priority
    
6248 Core Switch 32768
M6220 A1  Member
M6220 A2  16384
M6220 B1  32768
M6220 B2  32768
M6220 C1  Admin Status Down
M6220 C2  Admin Status Down

This is the current config on the aforementioned switches.

5 Practitioner

 • 

274.2K Posts

December 20th, 2016 08:00

Is this a complete list of the switches in your network? Does B1 and B2 connect to the core switch? The core switch should be changed to 4096. The M6220 switches can be left at the values they are currently set at.

8 Posts

May 10th, 2017 02:00

Hey Daniel,

sorry for the late reply. Bit of an update:

I changed the core switch to 4096 and there was even more packet loss. I expected maybe a little at first, but this happened for about 40 minutes so it is now back to 32768. I am really stuck at what to try next.

Just to remind you the blade switch physical port is in LAG 1, although only 1 port is a member of LAG 1 on the blade switch, however 2 ports are in LAG 1 on the core switch, but only 1 cable connected! Id did not set this up. The blade switch of course keeps complaining that 'Link on LAG- 1 is failed'.

Obvious I know that I need to remove the LAG altogether or create another member on the blade switch and get another cable plugged in. The latter would be my preferred option. This would have been done months ago, but the DC is half way across the city and we rarely visit. Also, we are now in the process of decommissioning the Blade server altogether, so there has been little progress, but now I want to get to the bottom of it.

I think first things first, we need the LAG sorted, but between now and my next visit to the data centre, do you have any advice?

P.S. B1 and B2 don't connect to the switch All of the servers on the blade connect to our core switch through that one cable in LAG 1.

Thanks in advance.

David

No Events found!

Top