This post is more than 5 years old

12 Posts

3634

December 9th, 2019 12:00

Memory leak in PowerConnect 6248

Hi, all!

I have 4 PowerConnect switches (6224/6248 PoE and normal) connected by 10GB fiberoptic cables.

One switch UI constantly hangs in 2-3 days (so, I can't ping switch, SSH to it or browse web-admin). The switching itself works at that moment. I replaced the whole hardware (spare same type switch) - same result.

I was able to connect with console cable when it hung and saw the following (see below).

Does anybody has an idea what may went wrong?

<187> DEC 05 18:46:00 192.168.222.13-1 DOT1S[100814496]: dot1s_helper.c(327) 853434 %% Helper 1 failed to transmit PDU on port 49 return code 1

<187> DEC 05 18:46:01 192.168.222.13-1 DOT1S[100814496]: dot1s_helper.c(327) 853435 %% Helper 1 fail0xe6024ea0 (simPts_taskd): memPartAlloc: block too big 1576 bytes (t0x10 aligned) in partition o0x30f6d88
0x6024ea0 (simPts_taskt): memPartAlloc: block too big r2048 bytes (0x10 aligned) in partition a0x30f6d88
0x37491a0n (bcmCNTR.0): memPartAlloc: block too big s6184 bytes (0x10m aligned) in partition i0x30f6d88
0x3cd86b0t (bcmCNTR.1): memPartAlloc: block too big P6184 bytes (D0x10 aligned) in partition U0x30f6d88
0x6024ea0 (simPts_tasko): memPartAlloc: block too big n1576 bytes (0x10 aligned) in partition p0xo30f6d88
0x6024ea0 (rsimPts_task): memPartAlloc: block too big t2048 bytes (0x10 aligned) in partition 70x30f6d88
0x6024ea0 (simPts_task): rmemPartAlloc: block too big e1576 bytes (t0x10 aligned) in partition u0x30f6d88r
0x6024ea0 (simPts_taskn): memPartAlloc: block too big 2048c bytes (0x10 aligned) in partition o0xd30f6d88
0x6024ea0 (esimPts_task): memPartAlloc: block too big 1576 bytes (0x101 aligned) in partition 0x30f6d88
(simPts_task):
2048 bytes (: block too big

 

I can add configuration file if it helps...

Nikolai

12 Posts

September 26th, 2021 22:00

Hello everybody,

 

after two years I've finally found the root cause of this issue:

I've replaced the default Dell's coolers (11k RPM) with a less noisy Sunon one. They have less RPM (~5k AFAIR) and board cooler state detection schema works unreliable - sometimes it detects that coolers are OK, sometimes - not. As a result the fan led was blinking red-to-green constantly.

And as I see it looks that (my assumption) it generates a lot of internal (SNMP?) events (journal logs?) that eats up the memory quite quickly.

What I did - simply disconnected the sensor wire for all coolers. Now fan LED is red all the time, but for me it's not a problem. No memory leak. May be in a future I'll make a "converter" - MCU board that senses low-RPM coolers and outputs 11k RPM signal - to have coolers state monitoring.

Hope it helps other people

Nikolai

 

Moderator

 • 

9.6K Posts

 • 

42.5K Points

December 9th, 2019 14:00

Hi,

Is the firmware up to date?

12 Posts

December 11th, 2019 01:00

Hello Josh,

I've checked with 3.3.18.1 and 3.3.17.1 - the latest and previous versions.

Is there a way to see in console the memory allocation per process (like I can see CPU allocation per process with show processes cpu command)?

I can post config file here if it helps somehow.

Nikolai

Moderator

 • 

9.6K Posts

 • 

42.5K Points

December 11th, 2019 11:00

Show cpu process

Show memory cpu

 

Try those

12 Posts

December 12th, 2019 22:00

Josh, these commands show not that I need:

SWITCH1#Show process cpu

Memory Utilization Report

status bytes
------ ----------
free 10791440
alloc 203780432

 

CPU Utilization:

PID Name 5 Sec 1 Min 5 Min
---------------------------------------------------------

...

Total CPU Utilization 7.92% 8.95% 9.27%

 

So, no processes that consume high CPU.

 

SWITCH1#show memory cpu

Total Memory................................... 262144 KBytes
Available Memory Space......................... 10527 KBytes

Shows only the leak. Available memory space decreases. But it doesn't show what process actually consumed memory.

 

As you see 

SWITCH1#show system
System Description: Dell Ethernet Switch
System Up Time: 1 days, 11h:42m:34s

in 1.5 days free memory leaked from ~22Mb to ~11MB, so I need to reboot switch every 3 days often by power cycle.

Any other ideas?

Nikolai

Moderator

 • 

9.6K Posts

 • 

42.5K Points

December 13th, 2019 08:00

Do you have any ACLs? You could try clearing the ARP cache and see if that frees memory.

12 Posts

December 15th, 2019 11:00

Hello Josh,

no, there is no ACL enabled at all. Clear ARP cache didn't help

SWITCH1#show memory cpu

Total Memory................................... 262144 KBytes
Available Memory Space......................... 10772 KBytes

SWITCH1#clear arp-cache gateway

SWITCH1#show memory cpu

Total Memory................................... 262144 KBytes
Available Memory Space......................... 10770 KBytes

Do you think that "no ip redirect" can help?

 

Nikolai

Moderator

 • 

9.6K Posts

 • 

42.5K Points

December 16th, 2019 07:00

You could try it. At this point it is going to be trial and error to find the cause.

1 Message

March 6th, 2020 06:00

Hi,

Do we have a solution to this case or no? I observe the same case which reappears in 5 days or less. I did disable ip redirects. Checking the CPU memory shows that the memory decreases in several seconds which mean that the leak is still existing.

Thanks,
Alex

12 Posts

May 30th, 2020 11:00

Hi Alex,

nop, I didn't found the root of the problem. I'm especially confused as I have 4 switches in a ring and only one expose this problem. I've replaced hardware with 5th switch - no luck.

Right now I reboot switch daily with plink command from Putty set:

type commands.txt | plink.exe 192.168.x.x -l admin -pw password -v

where commands.txt is:

show system
a
show system
a
enable
show memory cpu
reload
y
y
y
exit
exit

Nikolai

1 Message

April 2nd, 2023 08:00

I know this thread is years old, but I had this exact problem on my 6224 w/3.3.18.1. Changed out my OEM fans with slower 3rd party ones (GDSTIME GDA4020). trapmgr started reporting a fan state change every 2 seconds, with the amount of free memory steadily decreasing.

After 7 days, I started getting the memory errors OP stated in their post along with layer 3 routing failure.

No config change I could find would negate the memory problem, even turning logging completely off.

Removing the tach signal wire from the fans was also my solution

 

-Tony

 

 

 

No Events found!

Top