Has anyone else had problems where the M6220 blade switch management interface stops working, typically after a couple of weeks of uptime?
We've seen this freeze on firmware versions ranging from very old 3.x.x.x up to 18.104.22.168. The latest 22.214.171.124 hasn't been out long enough to be prone to the issue, which seems to be triggered by a combination of internal flash access and a uptime of a couple of weeks or more.
We have a lot of M6220 switches, running fw version 126.96.36.199 (currently the second newest available). Every now and then while working on the CLI, the interface freezes so that it stops accepting input, and no more output is printed. At this point, all management methods (except SNMP) become unavailable: Out-of-band CLI (via the CMC module), direct serial connection, and web.
SNMP continues working just fine. Both snmpwalk, snmpget and snmpset continue working.
What seems to trigger the freeze, is operations that access the internal flash storage on the switch. Things like "show running-config" or "copy running-config startup-config" seems to be the most common (or they might even be the sole) trigger for the freezes. Also, triggering the freeze like this seems to become more likely as the switch uptime increases, typically beyond a couple of weeks.
The main concern here is that there's no way for us (nor Dell, presumably) to troubleshoot a switch once it's in the frozen state. I would think the hw/fw manufacturer (Broadcom) has some tricks up its sleeve, but they might not be practical. The sheer number of switches we have could also mean that there's simply a much higher probability of us observing the issue, rather than someone with two or ten of them, so incidents are under reported.
It would help a lot to hear reports from other users who have run into the same issue, which would help us gain some momentum to get it tracked down and fixed.
I'm running in to a similar issue; what have you done to bring it out of the freeze? Power down? We're in the process of updating the f/w to v5, and one of my switches has frozen me out of the management interface.
Yes, a reset or power cycle (chassisaction -m switch-N reset [or powercycle]) seems to be the only way of recovering.
What version is the switch running currently, something newer than 4.2? I haven't worked with this for quite a while, so I'm not up to date on current status. Hopefully it was fixed somewhere in 4.x and not still present in version 5.
We just had another incident with frozen management on an M6220 stack running v188.8.131.52. I had hoped the issue was fixed long ago, but it appears to be still present; the symptoms are exactly like before.
This is a massive freakin' pain, and my patience with these switches has run out.
Does it still pass network traffic when it is in the frozen state? If you use putty to connect to the CMC with logging enabled and run dumplogs and racdump are there any errors from the chassis?
I came across this post looking for a fix to the exact same problem i have with my M6348's running 184.108.40.206
I understand this post is a bit old now but did you get any fix / response from Dell?
I'm hesitant to make a call yet as i'll just receive the "are you running the up-to-date firmware" response, and this doesn't seem to resolve the problem as per others in this post.
My exact issue is using putty to SSH into the switch, if i run "show run" it freezes the switch and all management interfaces stop working. The switch still routes traffic but to get management back, i have to re-seat the switches.
Hi, shiest. No fix or response from Dell, primarily due to us not pushing them on it. They will definitely reply with the 'latest firmware' canned response, which I can understand. The latest firmware currently is v220.127.116.11, while we run v18.104.22.168 (which, ref above, also crashes) on most of our switches today. The changelog highlights some otherwise astonishing crashers that were fixed recently, but none that apply to this specific issue.
I'd love to hear what Dell says if you decide to contact them. It should boil down to having the Dell or Broadcom engineers getting the issue reproduced in their lab, which I'd assume is not all that hard if they'd just give it time.
The latest firmware is 22.214.171.124 and shows in the release notes to fix some crash scenarios. 126.96.36.199 has quite a few crash scenarios fixed in it. I suggest updating the firmware to see if any of the fixes implemented in the new firmware resolves the issue you are seeing. You are correct, if you call into support they will ask you to do the same thing. The reason behind this is because this is how most issues resolutions are implemented, and new features implemented. The release notes give a pretty good detail on what is fixed in that release, but those fixes sometimes improve on and resolve other areas not directly discussed.
If the switch firmware is out of date the CMC might be out of date as well. I would check it and if out of date get it up to date.
If the problem still persists after these updates are done I would grab a #show-tech from the switches affected. That file can be reviewed and further troubleshooting can be done to try and isolate the cause.
seventhsven just pointed me to the issue,
and I can say "me, too!" but I'm seeing the crashes (not hangs, since they never recover) on my PC62xx and PC81xx (or now N4000); both types with latest FWs (and previous versions)
I haven't tested if I can then still access the snmp* functionality,
but yes, routing and switching stays alive,
but management is completely unreachable unless I reboot the switch via 'PowerCord'