Unsolved

This post is more than 5 years old

26 Posts

3786

October 30th, 2019 02:00

OS10 - S41xx - high CPU usage

Hello,

 

We have a problem with multiple switch DELL S series (4).

We use them with VLT, and all is working fine. But sometimes we see a process growing up with CPU usage, and in the same time, we have a lot of packet loss in the network. 

All S series have the same problem, at the similar time. They are interconnected with 2 x 100 Gbps (LAG between 2 sites, and 2 x100 Gbps for VLT locally) 

When the overload occured we see a lot of messages in the linux sub-system (but not reported in the OS10 env):

2019-10-30 10:29:01.142 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], HIGH MAC MOVE Rate detected
2019-10-30 10:29:01.143 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], StatsWindow: # Proc MAC Moves:0 AvgInterval:0(ms) Maxinterval:0(ms) Interval-exceeded:0
2019-10-30 10:29:01.144 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 0 seconds ago - 1
2019-10-30 10:29:01.145 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 1 seconds ago - 0
2019-10-30 10:29:01.146 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 2 seconds ago - 0
2019-10-30 10:29:01.149 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 3 seconds ago - 0
2019-10-30 10:29:01.150 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 4 seconds ago - 0
2019-10-30 10:29:01.155 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 5 seconds ago - 0
2019-10-30 10:29:01.157 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 6 seconds ago - 0
2019-10-30 10:29:01.161 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 7 seconds ago - 0
2019-10-30 10:29:01.162 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 8 seconds ago - 0
2019-10-30 10:29:01.163 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 9 seconds ago - 0
2019-10-30 10:29:01.165 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 10 seconds ago - 0
2019-10-30 10:29:01.168 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 11 seconds ago - 0
2019-10-30 10:29:01.169 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 12 seconds ago - 0
2019-10-30 10:29:01.172 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 13 seconds ago - 0
2019-10-30 10:29:01.173 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 14 seconds ago - 0
2019-10-30 10:29:01.176 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 15 seconds ago - 0
2019-10-30 10:29:01.177 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 16 seconds ago - 0
2019-10-30 10:29:01.179 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 17 seconds ago - 0
2019-10-30 10:29:01.180 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 18 seconds ago - 0
2019-10-30 10:29:01.183 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 19 seconds ago - 0
2019-10-30 10:29:01.185 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 20 seconds ago - 0
2019-10-30 10:29:01.186 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 21 seconds ago - 0
2019-10-30 10:29:01.189 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], # MAC move events received 22 seconds ago - 0
2019-10-30 10:29:09.167 CORE-A1-2 dn_infra_afs[notice]: [INFRA_AFS:INFRA-AFS-MAC], HIGH MAC MOVE Rate detected

Any ideas on what could be the origin ?

We are in old version (10.4.0E R1.56) and this problem occured without any regular schedule, or any regular event (known of us).

 

Moderator

 • 

9.6K Posts

 • 

42.2K Points

October 30th, 2019 07:00

Hi,

Can you move to the latest version? You could also try rebooting each switch and see if that helps.

26 Posts

October 31st, 2019 01:00

Well, as you know it's not so easy to upgrade Core switch in prod environement..

No other idea ?

Moderator

 • 

9.6K Posts

 • 

42.2K Points

November 1st, 2019 06:00

You could try stopping the peer link and reconnecting, it seems that the switches are getting out of sync.

26 Posts

November 4th, 2019 07:00

Hello,

What do you mean "getting out of sync" ? 

We have find a workaround : we have shutdown one link (under the LAG ) between the 2 sites, and the switch are became stable since 5 days. 

So, can it be a LACP bug ? 

26 Posts

November 7th, 2019 03:00

Hi,

After a crash reboot, on 1 switch, we decide to attempt an UPGRADE to resolv this issue. 

The upgrade failed, because :

- When 1 switch is on 10.5 version, the VLT is not functionning correctly, all VLAN are incorrect, and all LAG are failed (probably due to VLT mismatch on all LAG).

- When we upgrade all switches to 10.5 the VLT is ok, but we cannot use our QSFP28. So no inter-dc links up.

The GBIC are not "powered". No laser, but we have "power reading -0.2 db" (approax).

So we downgrade the switches, and remain in the same version.

Question 1 : Why VLT between 10.4 and 10.5 do not correspond / not compatible

Question 2: Why some GBIC (AKA our QSFP28) are not working.

26 Posts

December 12th, 2019 04:00

All switches have restarted.

We have also add new S5232F for interconnection between site.

But the problem still occured. Packet lost are increased when S4148 are using LAG with VLT.

On the Linux sub-system we have this errors (example):

S4148-1
2019-12-11 13:11:23.569 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:50:56:9b:6e:e5, VLAN:1003) to verify ageout from VLT peer-unit-id:1 name:VLT_NODE_1
2019-12-11 13:11:23.574 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 13:11:23.575 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:15:0e:3c, VLAN:1003) to verify ageout from VLT peer-unit-id:1 name:VLT_NODE_1
2019-12-11 13:11:23.688 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 13:11:23.689 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:15:0e:4f, VLAN:1003) to verify ageout from VLT peer-unit-id:1 name:VLT_NODE_1
2019-12-11 13:11:23.700 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 13:11:23.702 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:15:0e:4f, VLAN:1002) to verify ageout from VLT peer-unit-id:1 name:VLT_NODE_1
2019-12-11 13:11:23.748 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 13:11:23.749 S4148-1 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:11:0a:36, VLAN:1003) to verify ageout from VLT peer-unit-id:1 name:VLT_NODE_1


S4148-2
2019-12-11 12:27:46.900 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 12:27:46.901 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:b4:b6:86:2f:24:9b, VLAN:1003) to verify ageout from VLT peer-unit-id:2 name:VLT_NODE_2
2019-12-11 12:27:46.946 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 12:27:46.947 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:15:0e:4f, VLAN:1003) to verify ageout from VLT peer-unit-id:2 name:VLT_NODE_2
2019-12-11 12:27:46.977 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 12:27:46.978 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:15:0e:4f, VLAN:1002) to verify ageout from VLT peer-unit-id:2 name:VLT_NODE_2
2019-12-11 12:27:47.102 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 12:27:47.103 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:00:e0:20:11:0a:36, VLAN:1003) to verify ageout from VLT peer-unit-id:2 name:VLT_NODE_2
2019-12-11 12:57:46.120 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-CMN], Datastore Get failed
2019-12-11 12:57:46.121 S4148-2 dn_infra_afs[err]: [INFRA_AFS:INFRA-AFS-MAC], Error when fetching MAC entry (MAC:14:b3:1f:1f:66:b8, VLAN:1003) to verify ageout from VLT peer-unit-id:2 name:VLT_NODE_2

 

We also see another anormal thing. 

Some MAC address must stay in VLAN, and not visible in all VLANS.

 

 

26 Posts

December 16th, 2019 08:00

Thanks for your message.

We have open a ticket last week and upgrade all S4148 to 10.5.0.3.600. 
Its an incredible change of ressources consummation. 

We still have an issue with one LAG over links betwwen site. All other seems to be fixed.

There is some DELL Gbic that didn't works with the new version, and we have found a workaound with another switch.

Will keep you updated, and send you TAG in private.

Moderator

 • 

9.6K Posts

 • 

42.2K Points

December 16th, 2019 08:00

Hi,

Can you private message me the service tags?

No Events found!

Top