Start a Conversation

Unsolved

This post is more than 5 years old

73869

January 25th, 2012 01:00

M6220 frozen management

Has anyone else had problems where the M6220 blade switch management interface stops working, typically after a couple of weeks of uptime?

We've seen this freeze on firmware versions ranging from very old 3.x.x.x up to 4.1.1.9. The latest 4.2.0.4 hasn't been out long enough to be prone to the issue, which seems to be triggered by a combination of internal flash access and a uptime of a couple of weeks or more.

We have a lot of M6220 switches, running fw version 4.1.1.9 (currently the second newest available). Every now and then while working on the CLI, the interface freezes so that it stops accepting input, and no more output is printed. At this point, all management methods (except SNMP) become unavailable: Out-of-band CLI (via the CMC module), direct serial connection, and web.

SNMP continues working just fine. Both snmpwalk, snmpget and snmpset continue working.

What seems to trigger the freeze, is operations that access the internal flash storage on the switch. Things like "show running-config" or "copy running-config startup-config" seems to be the most common (or they might even be the sole) trigger for the freezes. Also, triggering the freeze like this seems to become more likely as the switch uptime increases, typically beyond a couple of weeks.

The main concern here is that there's no way for us (nor Dell, presumably) to troubleshoot a switch once it's in the frozen state. I would think the hw/fw manufacturer (Broadcom) has some tricks up its sleeve, but they might not be practical. The sheer number of switches we have could also mean that there's simply a much higher probability of us observing the issue, rather than someone with two or ten of them, so incidents are under reported.

It would help a lot to hear reports from other users who have run into the same issue, which would help us gain some momentum to get it tracked down and fixed.

February 20th, 2013 07:00

I'm running in to a similar issue; what have you done to bring it out of the freeze?  Power down? We're in the process of updating the f/w to v5, and one of my switches has frozen me out of the management interface.

19 Posts

February 20th, 2013 11:00

Yes, a reset or power cycle (chassisaction -m switch-N reset [or powercycle]) seems to be the only way of recovering.

What version is the switch running currently, something newer than 4.2? I haven't worked with this for quite a while, so I'm not up to date on current status. Hopefully it was fixed somewhere in 4.x and not still present in version 5.

February 21st, 2013 03:00

We are updating it to v5; when we got it, it was version 1.  Thanks for the info, I'll do that.

19 Posts

January 24th, 2014 05:00

We just had another incident with frozen management on an M6220 stack running v5.1.2.3. I had hoped the issue was fixed long ago, but it appears to be still present; the symptoms are exactly like before.

This is a massive freakin' pain, and my patience with these switches has run out.

Moderator

 • 

8.5K Posts

January 24th, 2014 08:00

Hi,

Does it still pass network traffic when it is in the frozen state? If you use putty to connect to the CMC with logging enabled and run dumplogs and racdump are there any errors from the chassis?

1 Message

June 17th, 2014 04:00

Hi seventhsven,

I came across this post looking for a fix to the exact same problem i have with my M6348's running 4.2.0.4

I understand this post is a bit old now but did you get any fix / response from Dell?

I'm hesitant to make a call yet as i'll just receive the "are you running the up-to-date firmware" response, and this doesn't seem to resolve the problem as per others in this post.

My exact issue is using putty to SSH into the switch, if i run "show run" it freezes the switch and all management interfaces stop working. The switch still routes traffic but to get management back, i have to re-seat the switches.

Cheers

Steve

19 Posts

June 17th, 2014 06:00

Hi, shiest. No fix or response from Dell, primarily due to us not pushing them on it. They will definitely reply with the 'latest firmware' canned response, which I can understand. The latest firmware currently is v5.1.4.5, while we run v5.1.2.3 (which, ref above, also crashes) on most of our switches today. The changelog highlights some otherwise astonishing crashers that were fixed recently, but none that apply to this specific issue.

I'd love to hear what Dell says if you decide to contact them. It should boil down to having the Dell or Broadcom engineers getting the issue reproduced in their lab, which I'd assume is not all that hard if they'd just give it time.

5 Practitioner

 • 

274.2K Posts

June 17th, 2014 07:00

The latest firmware is 5.1.4.5 and shows in the release notes to fix some crash scenarios. 5.1.3.7 has quite a few crash scenarios fixed in it. I suggest updating the firmware to see if any of the fixes implemented in the new firmware resolves the issue you are seeing. You are correct, if you call into support they will ask you to do the same thing. The reason behind this is because this is how most issues resolutions are implemented, and new features implemented. The release notes give a pretty good detail on what is fixed in that release, but those fixes sometimes improve on and resolve other areas not directly discussed.  

If the switch firmware is out of date the CMC might be out of date as well. I would check it and if out of date get it up to date.

www.dell.com/.../drivers

www.dell.com/.../drivers

www.dell.com/.../drivers

If the problem still persists after these updates are done I would grab a #show-tech from the switches affected. That file can be reviewed and further troubleshooting can be done to try and isolate the cause.

Thanks

1 Message

August 6th, 2014 05:00

seventhsven just pointed me to the issue,

and I can say "me, too!" but I'm seeing the crashes (not hangs, since they never recover) on my PC62xx and PC81xx (or now N4000); both types with latest FWs (and previous versions)

I haven't tested if I can then still access the snmp* functionality,

but yes, routing and switching stays alive,

but management is completely unreachable unless I reboot the switch via 'PowerCord'

5 Practitioner

 • 

274.2K Posts

August 6th, 2014 06:00

Are you using the Web GUI when this happens? If you try management through Telnet or SSH does the same thing happen? Does it happen when accessing a certain page or running a specific command? Or just randomly?

1 Message

August 14th, 2014 18:00

Same thing happen here to PC7024 and very annoying, it happen when I used Putty to telnet most of the time it happened when you run show config when it lag you press enter twice it start to hang, then none of the management method are accessible but the switch still working.

5 Practitioner

 • 

274.2K Posts

August 15th, 2014 07:00

Mcloudteo, what firmware is your switch at? does it hang when accessing through htp/https/ssh? Or is it just the Telnet, when running the command # show run?

19 Posts

November 26th, 2014 03:00

The PowerConnect firmware release 5.1.5.1 contains the following listed under "issues resolved":

  • Summary: Switch stops responding to Serial console, Telnet, SSH and WebUI.
  • User impact: Occasionally, Users will not have access to the serial console, SSH, telnet sessions after a download.
  • Resolution: Corrected socket timeout problem.
  • Affected platforms: All Platforms.

Now, what counts as "a download"? Pretty much every time in the past, the console has frozen when trying something like "show running-config" or "copy running-config startup-config". Do those classify as downloads?

Is there more details available to Dell insiders about this particular issue? The release notes don't identify it with a issue number, which doesn't exactly inspire confidence, but perhaps an internal database is available to Dell/Broadcom?

We'll try the latest 5.1.6.3 firmware and see how it pans out.

No Events found!

Top