3548 CPU Utilization

Question

I have a 3548p that is showing abnormally high CPU utilization:

CPU utilization
---------------
five seconds: 97%; one minute: 100%; five minutes: 88%

five seconds: 100%; one minute: 80%; five minutes: 88%

five seconds: 62%; one minute: 94%; five minutes: 82%
five seconds: 100%; one minute: 81%; five minutes: 87%

five seconds: 100%; one minute: 100%; five minutes: 88%

five seconds: 100%; one minute: 100%; five minutes: 92%

This is causing latency to the switch and also delay when managing via CLI:

Packets: Sent = 55797, Received = 55776, Lost = 21 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 13ms, Maximum = 3845ms, Average = 249ms

Pinging through the switch to connected devices seems fine. This switch has very low load: it is powering about 7 IP phones and 3 IP cameras, and has only about half of it's ports in use. It is not stacked and no VLANs are defined. It does have an EPS-470 attached for redundancy. Flow control is disabled globally, and STP is disabled selectively on some ports. Otherwise, its conf is very nearly factory default.

I have rebooted the switch and the issue comes back. It is running latest code (2.0.0.51). I noticed there is no "Show CPU Process" command in the CLI. How can I troubleshoot to see what is causing the load? I have enabled debug logging on the serial console and see no obvious cause, in fact no messages are being generated at all except login/logoff events.

Conf:

....# show run
interface range ethernet e(34,38)
spanning-tree disable
exit
voice vlan oui-table add 0001e3 Siemens_AG_phone________
voice vlan oui-table add 00036b Cisco_phone_____________
voice vlan oui-table add 00096e Avaya___________________
voice vlan oui-table add 000fe2 H3C_Aolynk______________
voice vlan oui-table add 0060b9 Philips_and_NEC_AG_phone
voice vlan oui-table add 00d01e Pingtel_phone___________
voice vlan oui-table add 00e075 Polycom/Veritel_phone___
voice vlan oui-table add 00e0bb 3Com_phone______________
interface vlan 1
ip address ....
exit
ip default-gateway ....
hostname ....
logging console debugging
username admin password ... level 15 encrypted
sntp unicast client enable
sntp unicast client poll
sntp server a.b.c.d poll
ip domain-name ....
ip name-server ....

Default settings:
Service tag: ....

SW version 2.0.0.51 (date 11-Feb-2013 time 10:45:21)

Fast Ethernet Ports
==========================
no shutdown
speed 100
duplex full
negotiation
flow-control off
mdix auto
no back-pressure

Gigabit Ethernet Ports
=============================
no shutdown
speed 1000
duplex full
negotiation
flow-control off
mdix auto
no back-pressure

interface vlan 1
interface port-channel 1 - 15

spanning-tree
spanning-tree mode STP

qos basic
qos trust cos
....#

m0rdy · Answer

I swapped it with another 3548p unit and used the same config. When unracking the problematic one, I unplugged one port at a time to see if the CPU load and high ping times went down when any particular device was disconnected, and it did not: at the end, with only a serial cable plugged into it for console access and a single ethernet cable plugged into the firewall for internet access, CPU times were still high.

Turning it off, racking a replacement and reconfiguring the new one did indeed fix the problem: CPU util went back down to normal (under 10%), and ping times were normal as well. What's strange is as soon as we did this (and not before), a second switch at the same site starting exhibiting the same behavior as the first problem! This lead me to believe the problem has something to do with spanning-tree, as perhaps there was a problem that was propagating to whatever the root master was. Of course, disconnecting that first switch (which was a core) would have forced an election process, and the second switch (an edge) may have inherited the bad state.

Currently, the edge switch is still in this state. It is showing itself as the STP bridge, not the root (the new core holds root). It's CPU is running at max, and ping times are high:

From the core switch that is directly connected via fiber:

TOCoreSW# ping tocamerasw
Pinging tocamerasw.britzinc.com. (10.229.100.252) with 56 bytes of data:

56 bytes from 10.229.100.252: icmp_seq=1. time=1400 ms
PING: no reply from 10.229.100.252
PING: timeout
56 bytes from 10.229.100.252: icmp_seq=3. time=620 ms
56 bytes from 10.229.100.252: icmp_seq=4. time=680 ms

----10.229.100.252 PING Statistics----
4 packets transmitted, 3 packets received, 25% packet loss
round-trip (ms) min/avg/max = 620/900/1400

The port does show some multicast traffic, but a very small percent and without any errors:

TOCoreSW# show int counter eth g1

Port InUcastPkts InMcastPkts InBcastPkts InOctets
---------------- ------------ ------------ ------------ ------------
g1 611656928 14345 64008 914038994303

Port OutUcastPkts OutMcastPkts OutBcastPkts OutOctets
---------------- ------------ ------------ ------------ ------------
g1 307397488 168974 563212 19821710826

FCS Errors: 0
Single Collision Frames: 0
Late Collisions: 0
Excessive Collisions: 0
Oversize Packets: 0
Internal MAC Rx Errors: 0
Received Pause Frames: 0
Transmitted Pause Frames: 0
TOCoreSW#

Flow control is off.

As far as VOIP goes, yes the phones have built-in switches and workstations are daisy-chained off of them. Because a dumb switch could theoretically be plugged into this same port, we cannot use portfast or disable STP on ports feeding phones, as they are technically not end-points. We have such a small network that we have not split voice and workstation traffic into distinct VLANs: there is only 1 VLAN on the switch, the default VLAN 1. We are not using the voice VLAN and have no OUI defined for our phones. This simplifies switch conf as well as phone conf, but we can test creating a voice vlan if necessary.

m0rdy · Answer

I actually did already take the original switch that showed this behavior and booted it up in an isolated environment, and the CPU utilization looked fine. So I agree there is probably something unique to that network causing this. Unfortunately it does not appear as if simply unplugging this device resolves the problem. Rebooting while each device is unplugged is an option, but will take an extremely long time!

Wireshark is a good idea, I will capture and analyze packets on that switch. What is the CLI command to enable port-mirroring, to put an interface in promiscuous mode where it gets copies of all traffic? I see:

config

int eth e1

port monitor

where can be any single port, and I can add up to 4 ports to this config, but that's far too low of a limit. Is there no way to make a port fully promiscuous?

Networking General

3548 CPU Utilization

Was this post helpful?