Unsolved
This post is more than 5 years old
1 Rookie
•
9 Posts
0
61661
September 12th, 2008 23:00
6248 6224F random reboots
We recently upgraded a client's network with (2) 6248 switches. One in layer2 mode only, the other is the site's 'core' layer-3 routing switch. The 6224F is basically a concentration point for (10) 5448 and 5424 edge switches in remote dorms and offices and is connected to the 'core' 6248 via a 2Gbps 2-port fiber LAG.
We have a ticket in with Dell, and are currently waiting a second response to some additional information.
The 6200-series switches are all running the latest firmware (2.1.0.13) but are randomly rebooting. The Dell tech I spoke to told me that "perhaps a broadcast storm or something is causing the reboots." But the reboots are random and they are not at the same time. (power is good. Each is on a functioning UPS and in one case the 6248 is currently the only device in a APC 1400)
I connected the serial port of the 'core' 6248 switch to a server running minicom, increased the logging level and began logging to a file until the 6248 rebooted. At the same time, I had that server running tcpdump, capturing in promiscuous mode.
What I saw was that the serial port logged a stack trace at the time of reboobvt, and there was no broadcast storm, nor any significant broadcasts captured by tcpdump before the reboot.
Dell quickly (almost too quickly) shipped us a replacement for one of the 6248 switches, which is exhibiting the same behaviour as the original. The original is currently connected to a server only via serial cable and has not rebooted in over 40 hours, while the new one has already rebooted twice in 24 hours.
Long story short... Is anyone else seeing this issue or am I the only one dealing with this right now?
--
Bill Arlofski
Reverse Polarity, LLC
No Events found!


cerbera_a84f2d
176 Posts
0
September 14th, 2008 22:00
can you do a
show version
and
show boot-version
on both switches and post the results
mandzo
53 Posts
0
September 15th, 2008 13:00
warlofski
1 Rookie
•
9 Posts
0
September 15th, 2008 13:00
Hi mandzo.
Yes, and it looks like there are 3 crashdump files on each switch too. Dell has not requested any information from us yet, but we are going to try to get this escalated today since these switches are in production in a 24/7 boarding school and things are NOT looking good right now. My guess is that they just might want to see these crashdump files. :)
(New 6248 switch)
server6248 #show dir
File name Size (in bytes)
------------- ---------------
vpd.bin 256
log2.bin 262132
slog2.txt 0
olog2.txt 0
boot.dim 49
slog1.txt 0
olog1.txt 0
image1 10106248
slog0.txt 0
olog0.txt 0
hpc_broad.cfg 148
asf.cfg 16
crashdump.ctl 356
sslt.rnd 1024
boot.cfg 16
dh512.pem 156
dh1024.pem 245
ssh_host_rsa_key 887
logNvmSave.bin 64
startup-config 11844
ssh_host_key 517
ssh_host_dsa_key 668
crashdump.0 33616
crashdump.1 33616
crashdump.2 33616
crashdump.3 33616
(Original 6248 switch)
ORIG-server6248 #
<189> JAN 05 00:11:38 10.1.0.254-1 UNKN[243338512]: cmd_logger_api.c(87) 97268 % CLI:E
IA-232:----:en
show dir
File name Size (in bytes)
------------- ---------------
vpd.bin 256
log2.bin 262132
slog2.txt 0
olog2.txt 0
boot.dim 85
slog1.txt 0
olog1.txt 0
image1 10106248
slog0.txt 0
olog0.txt 0
hpc_broad.cfg 148
asf.cfg 16
crashdump.ctl 356
sslt.rnd 1024
boot.cfg 16
dh512.pem 156
dh1024.pem 245
startup-config 11787
logNvmSave.bin 64
asset.tag 17
image2 10106248
ssh_host_key 517
ssh_host_dsa_key 668
ssh_host_rsa_key 887
crashdump.0 33616
crashdump.1 33616
crashdump.2 33616
crashdump.3 33616
--
Bill Arlofski
Reverse Polarity, LLC
warlofski
1 Rookie
•
9 Posts
0
September 15th, 2008 13:00
Ok, first the ORIGINAL 6248 switch (which is currently connected ONLY by serial port)
ORIG-server6248 #show version
Image Descriptions
image1 : default image
image2 :
Images currently available on Flash
--------------------------------------------------------------------
unit image1 image2 current-active next-active
--------------------------------------------------------------------
1 2.1.0.13 2.1.0.13 image2 image2
(Same version for both images because I uploaded it twice. :)
ORIG-server6248 #show boot-version
----------------------------------------
unit Boot Image Version
----------------------------------------
1 31 October 2007
And now the NEWLY installed 6248 currently running the network and randomly rebooting:
server6248 #show version
Image Descriptions
image1 : default image
image2 :
Images currently available on Flash
--------------------------------------------------------------------
unit image1 image2 current-active next-active
--------------------------------------------------------------------
1 2.1.0.13 image1 image1
server6248 #show boot-version
----------------------------------------
unit Boot Image Version
----------------------------------------
1 31 October 2007
warlofski
1 Rookie
•
9 Posts
0
September 15th, 2008 14:00
mandzo, thanks for the reply (and for corroborating what me and my colleagues believe as well)
The major concern that I have though is that we have two 6238 switches and one 6224F switch, all exhibiting this same (random rebooting) issue. We have already been shipped a new 6248 as a replacement for one of them, which is also rebooting.
If what you are saying is true, then Dell can ship us new 6248's and 6224F's for the rest of the year before we get three properly functional 6200-series switches? Sort of like trying to win the networking lottery? Sigh...
This is very unfortunate since these are all in production in a 24/7 environment, each replacement will need to be done after hours, and then we need to wait more than 48 hours after each replacement to "prove" that the switch is OK since I have seen them run without rebooting for almost two days a couple times.
I am not sure how hard you pushed, but when we escalate this today, we are going to try to get an honest, and full answer to this. My client is NOT happy right now. They went from 100% network uptime to random downtimes, and corrupted files etc etc.
Also, as far as I am concerned, NO packet or packets that one can put on the wire, either maliciously or otherwise should be able to reboot a properly functioning switch. But maybe that is just me :)
Thanks again for your response... I am curious... How long ago did you go through this process with Dell? Were they happily and quickly shipping you out replacements as soon as you requested them or did you have to jump through hoops? Also, is there anywhere else I can look (besides via serial console during boot) to show the HARDWARE version of these things?
--
Bill Arlofski
Reverse Polarity, LLC
warlofski
1 Rookie
•
9 Posts
0
September 15th, 2008 14:00
This is why I asked you that question.
When I called in, I spoke to a "Care resolution specialist" and didn't have to say too much more than "we have a 6248 that is randomly rebooting", give them our Service Tag #, and a new one was on the way that day.
I specifically had to request that I talk to a tech before they just ship me out a replacement.
When I got a tech, they never really "ASKED" me for anything. Matter of fact, I did most of the talking/explaining and I had to offer my config files, a sample network diagram, a serial console log showing a stack dump and some packet traces myself. Mostly because I am the type that figures if I did something wrong that might cause such issues, then I'll gladly take responsibility for it, and I don't want to waste a vendor's time and money having them just keep shipping me new parts for a problem that I caused.
Thanks again for your help.
--
Bill Arlofski
Reverse Polarity, LLC
mandzo
53 Posts
0
September 15th, 2008 14:00
Bill,
in general you could load your config off line and test the switch with commands like "show dir", "copy running-config tftp://......" . If the switch still has 1 dump file, it has good change to work OK.
It was easier to get a replacement an year ago, almost immediately. Now, it takes me few hours through different tests, Dell asked me to do, till I get the replacement. I think they are trying cut of the support spending, which is fine with me.
mandzo
53 Posts
0
September 15th, 2008 14:00
Bill,
In my opinion the existence of 4 dump files is indication for a hardware problem. I had to replace 2 6248s in roll (replacement switches, would you believe it) till I got a good one. Dell support can't explain me this, but it looks like a DRAM problem or an addressing hardware problem. In a normal switch, I never saw more than 1 dump file. You should get an immediately replacement.
DRNO10
184 Posts
0
September 15th, 2008 21:00
warlofski
1 Rookie
•
9 Posts
0
September 19th, 2008 19:00
The three 6200-series switches have been up for over 3 days each now.
It appears that there is a problem with SSH packets directed at the switch's management IP that is causing them to reboot. I am not sure of what other circumstances might also be contrinuting, or what combination of time, ssh packets, and other traffic may be required, or if it is just some certain amount of ssh traffic.
Once we stopped using SSH to manage and monitor these switches 3+ days ago, they have been stable.
These are all running 2.1.0.13 firmware.
We are waiting on a final confirmation, and what our next step(s) are from the Dell "Complex Systems Technical Support Analyst" who has been working with us to resolve this issue.
--
Bill Arlofski
Reverse Polarity, LLC
tlf01111
5 Posts
0
October 19th, 2008 15:00
I've also been having a consistent problem with a rebooting 6224. On occasion we also get a complete freeze of the switch which is only resolved by unplugging it and plugging it back in.
We've been in contact with Dell support, and they're mentioned the SSH issue above--however we have SSH completely disabled. We can usually reproduce the problem by running "ip http server" and then hitting the switch's HTTP interface. That will usually cause an instant reboot.
It's also rebooted by just issuing regular commands on the console, which I find absolutely frustrating.
Anyone else experiencing issues like this with a 6224? I'm leaning towards a hardware problem as the original poster did, but Dell assures me it's an OS-related problem.
I'm getting near the end of my rope--a switch is the last thing I need to worry about rebooting randomly!