Start a Conversation

Unsolved

This post is more than 5 years old

50072

September 26th, 2009 20:00

6224 reloads when ssh session idle-timeout under 1.0.2.6

I have seen a 6224 reload several times when an SSH session closes.  Tonight, it happened when SSH timed-out sitting at a command prompt.  Syslog messages usually look like:

SEP 27 01:35:56 10.11176.253-1 UNKN[126181056]: ssh_threads.c(641) 168173 %% SSHD: read_pkt failed: S_errno_EPIPE
SEP 27 01:35:56 10.11.176.253-1 UNKN[126181056]: ssh_sys_fastpath.c(568) 168174 %% tid 0x7855ec0, context 0x7844e28, deleted tid 0x781a0e0, retval = 1
SEP 27 01:36:17 10.11.176.253-1 UNKN[251867216]: user_mgr.c(1560) 168175 %% User Login Failed for $enab15$
SEP 27 01:36:17 10.11.176.253-1 TRAPMGR[251867216]: traputil.c(853) 168176 %% Failed User Login: Unit: 1 User ID: $enab15$
SEP 27 01:36:33 10.11.176.253-1 UNKN[126180336]: ssh_buffer.c(1196) 168179 %% SSHD: read(#1) failed: S_errno_EPIPE
SEP 27 01:36:33 10.11.176.253-1 UNKN[126180336]: ssh_threads.c(641) 168180 %% SSHD: read_pkt failed: S_errno_EPIPE
SEP 27 01:36:34 10.11.176.253-1 UNKN[125902992]: ssh_sys_fastpath.c(568) 168181 %% tid 0x7812090, context 0x7855f58, deleted tid 0x7855bf0, retval = 0
JAN 01 00:01:48 10.11.176.253-1 TRAPMGR[217761040]: traputil.c(853) 101 %% Port 24 is transitioned from the Learning state to the Forwarding state in instance 0

 

On a 6224 with 2.2.0.3, the only message I see is:

SEP 27 02:22:51 10.11..209.250-1 UNKN[117316112]: ssh_sys_fastpath.c(572) 5159 %% tid 0x6fe1a10, context 0x6fd03f8, deleted tid 0x7004030, retval = 0

Is anyone aware of any system crash-causing bugs in SSH-related code, and if so, was it fixed in any releases?

 

98 Posts

October 2nd, 2009 14:00

Below are the ssh/telnet related bugs that were fixed in the 2.1.0.x updates.  Some of these may have also been contributing to the issue you are seeing.  It is recommended that you upgrade to the 2.2.0.3 firmware. 

Fixed in 2.1.1.0 and 2.1.1.3 
=====================
79392 ssh: Memory leak (2976 bytes) seen when enable/disable ssh
79395 ssh login/logout caused memory leak
82238, 80831, 169536 - SSH connection using Putty .59 or .60 will fail if the SSH server are set to use a specific port.
85599 85467 180271 62xx - telnetting to fully qualified (FQDN) hostname reboots switch

-Victor

43 Posts

October 4th, 2009 16:00

Thanks.  I did see 79395 too: I had a script which logged in/out each night, and 62xx's under 1.0.2.6 would reload at-least once per month.  The odd thing about this last reload, was that this switch had only been up for two days.

I also saw a separate issue that after about 246 SSH sessions, the 62xx would be pingable but not SSH'able.  This was discouraging from a QA point of view.  I use to test devices for another manufacturer, and their code was just awful at the start of each test cycle.  But 246 command iterations to crash a switch should be "low hanging fruit" for the testers.

On the 2.2.0.3 code, I have five test harnesses running, each one SSH'ing into the test switch and executing "show running-config".  These five "users" each have a random delay in their session, to ensure that their sessions overlapped each other in various combinations (e.g. one user connecting a socket while another user is closing a socket; two users executing "show run" at the same fraction of a second).  I have not see any crashes yet, which is good.

43 Posts

October 25th, 2009 22:00

Sorry, upgrading has not fixed the problem.  

Would Dell like to see a Tcl/Expect script that crashes a Dell 6248 about 40% of the time?  I have it set to login to the switch every minute, and have 12 reboots in 35 minutes.

43 Posts

October 25th, 2009 22:00

BTW, I believe that this is after upgrading to 2.2.0.3:

 

 Images currently available on Flash 

--------------------------------------------------------------------

 unit      image1      image2     current-active        next-active 

-------------------------------------------------------------------- 

    2    2.0.0.12     2.2.0.3             image2             image2 

43 Posts

October 30th, 2009 19:00

Problem confirmed. If you require high uptime, disable SSH on PC62xx with firmware 1.0.2.6 or 2.0.0.12 or 2.2.0.3.  Use telnet instead.

7 Posts

March 23rd, 2010 18:00

We are experiencing the same issues on our 6248s.  I can't believe that this is still an issue, and that there isn't a firmware update out to address this.  We use rancid to back up our switch configs, and our monitoring system scans (or was scanning) ssh for availability, and we would have entire stacks reboot randomly because of this ssh problem.  This is completely un-acceptable for an enterprise switch.

These switches have a number of other limitations and bugs (bad/incorrect/missing SNMP data and broken BPDU guard to name a few).

We're going to burn these things and get real switches as soon as we can.

What a waste of money.  I'm a big Dell fan, but never again will I trust Dell network gear.

1 Message

April 10th, 2010 09:00

No Events found!

Top