davostevo

8 Posts

3868

December 14th, 2016 01:00

Dell PowerConnect 6248 LAG Dropping Connection

We are currently having some issues with our Blade Server LAG connection. The error we receive is "0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0". This happens and then the connection will drop as the connections transition from forwarding to learning states. This happens a few times per day.

Connection Information:

We are not using Jumbo Frames for our connection to the blade servers
Portfast is not enabled.
Firmware is currently on 3.0.0.8 and the latest is 3.3.15.1
There is a high number of Broadcast, Multicast and Unicast Packets on the LAG Channel.

Do you have any suggestions for a fix?

Thanks in advance

Responses(10)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

December 14th, 2016 11:00

Can you provide us with some more information on the topology you are working with?

- What blade chassis is the server installed in?

- What model server?

- What model switch is installed in the blade chassis?

- Is this error being seen on the 6248 or the blade chassis switch?

- I port 0/3/1 the only interface with this message?

- What does port 0/3/1 connect to?

These transitions are typically caused by spanning tree topology changes or faulty cables/ports.

davostevo

8 Posts

0

December 16th, 2016 06:00

Hey Daniel,

sorry for the late reply. To answer your questions:

- What blade chassis is the server installed in?

M1000e

- What model server?

PowerEdge M600

- What model switch is installed in the blade chassis?

Where would I find this out? In the Dell Chassis Management Controller interface? Each

- Is this error being seen on the 6248 or the blade chassis switch?

The error is being seen on the 6248 switch

- Is port 0/3/1 the only interface with this message?

This is the first port which flaps, fails (Link on 0/3/1 is failed), then a new STP root is selected and then the rest follow runnning in a loop of blocking and forwarding states. This happens about twice a day, but depends on the amount of traffic per day.

- What does port 0/3/1 connect to?

Port 0/3/1 is a 2 interface LAG CH1 group and connects to our Blade Servers. It has two interfaces in the LAG, but only one interface is actually connected to the Blade Server. I did not set this up, I have just arrived at it, and these are the issues I am having.

I hope this makes the situation more clear.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

December 16th, 2016 07:00

The easiest method of identifying the switch will be to physically observer which I/O module the 6248 plugs into. Logging into the M1000e CMC will show you which modules are installed.

Are there any messages in the logs which refer to a topology change being received on a specific interface? Some topology change is normal on a network, as devices are added/removed. However if all switches are left at the default spanning tree values, then root selection is dependent upon MAC addresses. When this is the situation, a BPDU being received on an interface can cause a network wide spanning tree convergence.

Is the 6248 your core switch? A common practice is to take your core switch and assign it a spanning tree value of 4096. This value ensures that the core switch stays the root switch. Then any secondary core switch, like when using VRRP, would be set to 8192. Your distribution layer would be set to 16384, and access layer can be left at default values. This can help reduce the impact of topology change notification.

It may also be a good idea to rule out possible physical failures. Check the cable on 0/3/1, ensure it is good. If possible perhaps try another physical port on the 6248.

Firmware can also help with overall operability. It may not get rid of the issue at hand, but may help the switch handle the conditions smoother.

Keep us posted.

davostevo

8 Posts

0

December 20th, 2016 01:00

Hi Daniel,

the modules installed are: Dell PowerConnect M6220

Process seems to be, all is well and then:

19888 Notice DEC 16 20:57:47 TRAPMGR 0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0
19889 Notice DEC 16 20:57:47 TRAPMGR Unit 1 elected as the new STP root
19890 Notice DEC 16 20:57:47 TRAPMGR Instance 0 has elected a new STP root: 8000:001e:c990:45b0
19891 Notice DEC 16 20:57:47 TRAPMGR Link on 0/3/1 is failed
19892 Notice DEC 16 20:57:47 TRAPMGR Instance 0 has elected a new STP root: 8000:001c:2eb2:d480
19893 Notice DEC 16 20:58:19 TRAPMGR 0/3/1 is transitioned from the Forwarding state to the Blocking state in instance 0
19894 Notice DEC 16 20:58:19 TRAPMGR Instance 0 has elected a new STP root: 4000:0021:9bae:6262
19895 Notice DEC 16 20:58:19 TRAPMGR 1/0/3 is transitioned from the Forwarding state to the Blocking state in instance 0
19896 Notice DEC 16 20:58:19 TRAPMGR 1/0/35 is transitioned from the Forwarding state to the Blocking state in instance 0
19897 Notice DEC 16 20:58:19 TRAPMGR 1/0/7 is transitioned from the Forwarding state to the Blocking state in instance 0
19898 Notice DEC 16 20:58:19 TRAPMGR 1/0/8 is transitioned from the Forwarding state to the Blocking state in instance 0
19899 Notice DEC 16 20:58:19 TRAPMGR 1/0/11 is transitioned from the Forwarding state to the Blocking state in instance 0
19900 Notice DEC 16 20:58:19 TRAPMGR 1/0/16 is transitioned from the Forwarding state to the Blocking state in instance 0
19901 Notice DEC 16 20:58:19 TRAPMGR 1/0/24 is transitioned from the Forwarding state to the Blocking state in instance 0
19902 Notice DEC 16 20:58:19 TRAPMGR 1/0/34 is transitioned from the Forwarding state to the Blocking state in instance 0
19903 Notice DEC 16 20:58:19 TRAPMGR 1/0/37 is transitioned from the Forwarding state to the Blocking state in instance 0
19904 Notice DEC 16 20:58:19 TRAPMGR 1/0/38 is transitioned from the Forwarding state to the Blocking state in instance 0
19905 Notice DEC 16 20:58:19 TRAPMGR 1/0/40 is transitioned from the Forwarding state to the Blocking state in instance 0
19906 Notice DEC 16 20:58:19 TRAPMGR 1/0/41 is transitioned from the Forwarding state to the Blocking state in instance 0
19907 Notice DEC 16 20:58:19 TRAPMGR 1/0/42 is transitioned from the Forwarding state to the Blocking state in instance 0
19908 Notice DEC 16 20:58:19 TRAPMGR 0/3/11 is transitioned from the Forwarding state to the Blocking state in instance 0
19909 Notice DEC 16 20:58:19 TRAPMGR 0/3/12 is transitioned from the Forwarding state to the Blocking state in instance 0
19910 Notice DEC 16 20:58:19 TRAPMGR 0/3/13 is transitioned from the Forwarding state to the Blocking state in instance 0
19911 Notice DEC 16 20:58:19 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19912 Notice DEC 16 20:58:19 TRAPMGR 0/3/1 is transitioned from the Learning state to the Forwarding state in instance 0
19913 Notice DEC 16 20:58:19 TRAPMGR 1/0/11 is transitioned from the Learning state to the Forwarding state in instance 0
19914 Notice DEC 16 20:58:23 TRAPMGR 1/0/3 is transitioned from the Learning state to the Forwarding state in instance 0
19915 Notice DEC 16 20:58:23 TRAPMGR 1/0/7 is transitioned from the Learning state to the Forwarding state in instance 0
19916 Notice DEC 16 20:58:23 TRAPMGR 1/0/8 is transitioned from the Learning state to the Forwarding state in instance 0
19917 Notice DEC 16 20:58:23 TRAPMGR 1/0/16 is transitioned from the Learning state to the Forwarding state in instance 0
19918 Notice DEC 16 20:58:23 TRAPMGR 1/0/24 is transitioned from the Learning state to the Forwarding state in instance 0
19919 Notice DEC 16 20:58:23 TRAPMGR 1/0/34 is transitioned from the Learning state to the Forwarding state in instance 0
19920 Notice DEC 16 20:58:23 TRAPMGR 1/0/35 is transitioned from the Learning state to the Forwarding state in instance 0
19921 Notice DEC 16 20:58:23 TRAPMGR 1/0/37 is transitioned from the Learning state to the Forwarding state in instance 0
19922 Notice DEC 16 20:58:23 TRAPMGR 1/0/38 is transitioned from the Learning state to the Forwarding state in instance 0
19923 Notice DEC 16 20:58:23 TRAPMGR 1/0/40 is transitioned from the Learning state to the Forwarding state in instance 0
19924 Notice DEC 16 20:58:23 TRAPMGR 1/0/41 is transitioned from the Learning state to the Forwarding state in instance 0
19925 Notice DEC 16 20:58:23 TRAPMGR 1/0/42 is transitioned from the Learning state to the Forwarding state in instance 0
19926 Notice DEC 16 20:58:23 TRAPMGR 0/3/11 is transitioned from the Learning state to the Forwarding state in instance 0
19927 Notice DEC 16 20:58:23 TRAPMGR 0/3/12 is transitioned from the Learning state to the Forwarding state in instance 0
19928 Notice DEC 16 20:58:23 TRAPMGR 0/3/13 is transitioned from the Learning state to the Forwarding state in instance 0
19929 Notice DEC 16 20:58:49 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19930 Notice DEC 16 20:58:53 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19931 Notice DEC 16 20:58:57 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19932 Notice DEC 16 20:59:01 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19933 Notice DEC 16 20:59:05 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19934 Notice DEC 16 20:59:09 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19935 Notice DEC 16 20:59:13 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19936 Notice DEC 16 20:59:17 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1
19937 Notice DEC 16 20:59:21 TRAPMGR Spanning Tree Topology Change: 0, Unit: 1

then it goes back to being well until the next high load on the switch. (Roughly happens 2-3 times a day)

Yes, the 6248 is our core switch. I would like to change the spanning tree value as well.

I think I am on the lines of:

Replace the cable to be certain
Upgrade the firmware
Change the STP value
and
enable Port Fast on the LAG

Can I just ask, we have two ports configured in the LAG CH1 group (port 13 and port 15), but yet only one port is physically connected to the Blade server to transfer a lot of traffic from a Blade DB server, could this have anything to do with our issues?

Would reconnecting the second port help or not make a difference? Like I said, I did not configure this, just looking in to ways to resolve the STP loop and loss in packets on the Blade server.

Thanks in advance.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

December 20th, 2016 06:00

Setting the STP priority so that the core switch stays the root should help out. With the spanning tree priority set to 4096, even if a topology change occurs, the root switch will not bounce around.

I advise against setting the switch to switch connection to portfast.

Link aggregation is designed to continue working when one link goes down. However if the intention is to only have one connection between these switches, It would clean up the config by taking out unnecessary lines.

Have you checked the logs on the M6220 to see what messages are occurring around the same time as the above messages?

A

Anonymous

5 Practitioner

•

274.2K Posts

0

December 20th, 2016 07:00

Portfast will transition the interface straight into forwarding mode. The intention with portfast is that edge ports come up quicker. The concern with enabling portfast on switch to switch connections, is that loops will not be caught since the port goes straight into forwarding.

Cisco has some good documentation on portfast and topology changes.

http://bit.ly/1rdbj1B

http://bit.ly/1LiQTzK

Proposed action plan:

- Set core switch to priority 4096

- remove LAG configuration from both switch, and use just the single connection.

- Replace/Test cable that connects the two switches.

- monitor and record behavior

davostevo

8 Posts

0

December 20th, 2016 07:00

Hi Daniel,

once again thanks for the response. Why would you advise against changing the connection to portfast?

OK, had a look at the logs on the blade M6220 switch and when our core switch is having its issues, this is happening on the M6220:

11219 Notice DEC 19 12:45:50 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11220 Notice DEC 19 12:45:50 TRAPMGR %% LAG- 1 is transitioned from the Forwarding state to the Blocking state in instance 0
11221 Notice DEC 19 12:45:50 TRAPMGR %% Link on LAG- 1 is failed
11222 Notice DEC 19 12:45:53 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11223 Notice DEC 19 12:45:53 TRAPMGR %% Link on LAG- 1 is failed
11224 Notice DEC 19 12:45:53 TRAPMGR %% Unit 1 Port 17 is transitioned from the Forwarding state to the Blocking state in instance 0
11225 Notice DEC 19 12:45:53 TRAPMGR %% Link Up: Unit 1 Port 17
11226 Notice DEC 19 12:46:23 TRAPMGR %% Spanning Tree Topology Change: 0, Unit: 1
11227 Notice DEC 19 12:46:23 TRAPMGR %% Unit 1 Port 17 is transitioned from the Learning state to the Forwarding state in instance 0
11228 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is removed from the trunk LAG- 1
11229 Notice DEC 19 12:46:25 TRAPMGR %% Link on LAG- 1 is failed
11230 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is added to the trunk LAG- 1
11231 Notice DEC 19 12:46:25 TRAPMGR %% Unit 1 Port 17 is transitioned from the Forwarding state to the Blocking state in instance 0
11232 Notice DEC 19 12:46:25 TRAPMGR %% LAG- 1 is transitioned from the Forwarding state to the Blocking state in instance 0
11233 Notice DEC 19 12:46:55 TRAPMGR %% Spanning Tree Topology Change: 0, Unit: 1
11234 Notice DEC 19 12:46:55 TRAPMGR %% LAG- 1 is transitioned from the Learning state to the Forwarding state in instance 0

David

davostevo

8 Posts

0

December 20th, 2016 08:00

Switch   Spanning Tree Priority

6248 Core Switch 32768
M6220 A1  Member
M6220 A2  16384
M6220 B1  32768
M6220 B2  32768
M6220 C1  Admin Status Down
M6220 C2  Admin Status Down

This is the current config on the aforementioned switches.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

December 20th, 2016 08:00

Is this a complete list of the switches in your network? Does B1 and B2 connect to the core switch? The core switch should be changed to 4096. The M6220 switches can be left at the values they are currently set at.

davostevo

8 Posts

0

May 10th, 2017 02:00

Hey Daniel,

sorry for the late reply. Bit of an update:

I changed the core switch to 4096 and there was even more packet loss. I expected maybe a little at first, but this happened for about 40 minutes so it is now back to 32768. I am really stuck at what to try next.

Just to remind you the blade switch physical port is in LAG 1, although only 1 port is a member of LAG 1 on the blade switch, however 2 ports are in LAG 1 on the core switch, but only 1 cable connected! Id did not set this up. The blade switch of course keeps complaining that 'Link on LAG- 1 is failed'.

Obvious I know that I need to remove the LAG altogether or create another member on the blade switch and get another cable plugged in. The latter would be my preferred option. This would have been done months ago, but the DC is half way across the city and we rarely visit. Also, we are now in the process of decommissioning the Blade server altogether, so there has been little progress, but now I want to get to the bottom of it.

I think first things first, we need the LAG sorted, but between now and my next visit to the data centre, do you have any advice?

P.S. B1 and B2 don't connect to the switch All of the servers on the blade connect to our core switch through that one cable in LAG 1.

Thanks in advance.

David

View All

No Events found!

Networking General

Dell PowerConnect 6248 LAG Dropping Connection