Unsolved

This post is more than 5 years old

1 Rookie

 • 

9 Posts

61661

September 12th, 2008 23:00

6248 6224F random reboots


We recently upgraded a client's network with (2) 6248 switches. One in layer2 mode only, the other is the site's 'core' layer-3 routing switch. The 6224F is basically a concentration point for (10) 5448 and 5424 edge switches in remote dorms and offices and is connected to the 'core' 6248 via a 2Gbps 2-port fiber LAG.

We have a ticket in with Dell, and are currently waiting a second response to some additional information.

The 6200-series switches are all running the latest firmware (2.1.0.13) but are randomly rebooting. The Dell tech I spoke to told me that "perhaps a broadcast storm or something is causing the reboots." But the reboots are random and they are not at the same time. (power is good. Each is on a functioning UPS and in one case the 6248 is currently the only device in a APC 1400)

I connected the serial port of the 'core' 6248 switch to a server running minicom, increased the logging level and began logging to a file until the 6248 rebooted. At the same time, I had that server running tcpdump, capturing in promiscuous mode.

What I saw was that the serial port logged a stack trace at the time of reboobvt, and there was no broadcast storm, nor any significant broadcasts captured by tcpdump before the reboot.

Dell quickly (almost too quickly) shipped us a replacement for one of the 6248 switches, which is exhibiting the same behaviour as the original. The original is currently connected to a server only via serial cable and has not rebooted in over 40 hours, while the new one has already rebooted twice in 24 hours.

Long story short... Is anyone else seeing this issue or am I the only one dealing with this right now?

--
Bill Arlofski
Reverse Polarity, LLC

September 14th, 2008 22:00

can you do a

 

show version

 

and

 

show boot-version

 

on both switches and post the results

53 Posts

September 15th, 2008 13:00

If you type "show dir", are you getting any output?

1 Rookie

 • 

9 Posts

September 15th, 2008 13:00

 

Hi mandzo.

 

Yes, and  it looks like there are 3 crashdump files on each switch too.   Dell has not requested any information from us yet, but we are going to try to get this escalated today since these switches are in production in a 24/7 boarding school and things are NOT looking good right now. My guess is that they just might want to see these crashdump files. :)

 

 

(New 6248 switch)

server6248 #show dir


File name          Size (in bytes)
-------------      ---------------
vpd.bin            256
log2.bin           262132
slog2.txt          0
olog2.txt          0
boot.dim           49
slog1.txt          0
olog1.txt          0
image1             10106248
slog0.txt          0
olog0.txt          0
hpc_broad.cfg      148
asf.cfg            16
crashdump.ctl      356
sslt.rnd           1024
boot.cfg           16
dh512.pem          156
dh1024.pem         245
ssh_host_rsa_key   887
logNvmSave.bin     64
startup-config     11844
ssh_host_key       517
ssh_host_dsa_key   668
crashdump.0        33616
crashdump.1        33616
crashdump.2        33616
crashdump.3        33616

 

 

(Original 6248 switch)

ORIG-server6248 #
<189> JAN 05 00:11:38 10.1.0.254-1 UNKN[243338512]: cmd_logger_api.c(87) 97268 % CLI:E
IA-232:----:en
show dir

File name          Size (in bytes)
-------------      ---------------
vpd.bin            256
log2.bin           262132
slog2.txt          0
olog2.txt          0
boot.dim           85
slog1.txt          0
olog1.txt          0
image1             10106248
slog0.txt          0
olog0.txt          0
hpc_broad.cfg      148
asf.cfg            16
crashdump.ctl      356
sslt.rnd           1024
boot.cfg           16
dh512.pem          156
dh1024.pem         245
startup-config     11787
logNvmSave.bin     64
asset.tag          17
image2             10106248
ssh_host_key       517
ssh_host_dsa_key   668
ssh_host_rsa_key   887
crashdump.0        33616
crashdump.1        33616
crashdump.2        33616
crashdump.3        33616
 

 

--

Bill Arlofski

Reverse Polarity, LLC

1 Rookie

 • 

9 Posts

September 15th, 2008 13:00

 

Ok, first the ORIGINAL 6248 switch (which is currently connected ONLY by serial port)

 

ORIG-server6248 #show version

Image Descriptions

 image1 : default image
 image2 :


Images currently available on Flash

--------------------------------------------------------------------
 unit      image1      image2     current-active        next-active
--------------------------------------------------------------------

    1    2.1.0.13    2.1.0.13             image2             image2

 

(Same version for both images because I uploaded it twice. :)

 

 

ORIG-server6248 #show boot-version
----------------------------------------
   unit       Boot Image Version
----------------------------------------

    1         31 October 2007

 

 

 

And now the NEWLY installed 6248 currently running the network and randomly rebooting:

 

server6248 #show version

Image Descriptions

 image1 : default image
 image2 :


 Images currently available on Flash

--------------------------------------------------------------------
 unit      image1      image2     current-active        next-active
--------------------------------------------------------------------

    1    2.1.0.13                   image1             image1

 

 

 server6248 #show boot-version
----------------------------------------
   unit       Boot Image Version
----------------------------------------

    1         31 October 2007

 

1 Rookie

 • 

9 Posts

September 15th, 2008 14:00

mandzo, thanks for the reply (and for corroborating what me and my colleagues believe as well)

 

The major concern that I have though is that we have two 6238 switches and one 6224F switch, all exhibiting this same (random rebooting) issue.   We have already been shipped a new 6248 as a replacement for one of them, which is also rebooting.

 

If what you are saying is true, then Dell can ship us new 6248's and 6224F's for the rest of the year before we get three properly functional 6200-series switches?   Sort of like trying to win the networking lottery?   Sigh...

 

This is very unfortunate since these are all in production in a 24/7 environment, each replacement will need to be done after hours, and then we need to wait more than 48 hours after each replacement to "prove" that the switch is OK since I have seen them run without rebooting for almost two days a couple times.

 

I am not sure how hard you pushed, but when we escalate this today, we are going to try to get an honest, and full answer to this. My client is NOT happy right now.  They went from 100% network uptime to random downtimes, and corrupted files etc etc.

 

Also, as far as I am concerned, NO packet or packets that one can put on the wire, either maliciously or otherwise should be able to reboot a properly functioning switch.  But maybe that is just me :)

 

Thanks again for your response...    I am curious... How long ago did you go through this process with Dell?  Were they happily and quickly shipping you out replacements as soon as you requested them or did you have to jump through hoops? Also, is there anywhere else I can look (besides via serial console during boot) to show the HARDWARE version of these things?

 

 

--

Bill Arlofski

Reverse Polarity, LLC

Message Edited by warlofski on 09-15-2008 11:22 AM
Message Edited by warlofski on 09-15-2008 11:22 AM
Message Edited by warlofski on 09-15-2008 11:23 AM

1 Rookie

 • 

9 Posts

September 15th, 2008 14:00


@mandzo wrote:

It was easier to get a replacement an year ago, almost immediately. Now, it takes me few hours through different tests, Dell asked me to do, till I get the replacement.


 

This is why I asked you that question.

 

When I called in, I spoke to a "Care resolution specialist"  and didn't have to say too much more than "we have a 6248 that is randomly rebooting", give them our Service Tag #, and a new one was on the way that day. 

 

I specifically had to request that I talk to a tech before they just ship me out a replacement. 

 

When I got a tech, they never really "ASKED" me for anything.   Matter of fact, I did most of the talking/explaining and I had to offer my config files, a sample network diagram, a serial console log showing a stack dump and some packet traces myself.   Mostly because I am the type that figures if I did something wrong that might cause such issues, then I'll gladly take responsibility for it, and I don't want to waste a vendor's time and money having them just keep shipping me new parts for a problem that I caused.

 

Thanks again for your help.   

 

--

Bill Arlofski

Reverse Polarity, LLC

53 Posts

September 15th, 2008 14:00

Bill,

in general you could load your config off line and test the switch with commands like "show dir", "copy running-config tftp://......" . If the switch still has 1 dump file, it has good change to work OK.

It was easier to get a replacement an year ago, almost immediately. Now, it takes me few hours through different tests, Dell asked me to do, till I get the replacement. I think they are trying cut of the support spending, which is fine with me. 

Message Edited by mandzo on 09-15-2008 10:39 AM

53 Posts

September 15th, 2008 14:00

Bill,

In my opinion the existence of 4 dump files is indication for a hardware problem. I had to replace 2 6248s in roll (replacement switches, would you believe it) till I got a good one. Dell support can't explain me this, but it looks like a DRAM problem or an addressing hardware problem. In a normal switch, I never saw more than 1 dump file. You should get an immediately replacement.

Message Edited by mandzo on 09-15-2008 10:06 AM

184 Posts

September 15th, 2008 21:00

This is very odd, do you know what hardware revision level you have? I wonder if this is a problem with the newer switches. I have only ever had 1 issue with the 62XX switches rebooting and that was in a large stack that was fixed with a firmware update.

1 Rookie

 • 

9 Posts

September 19th, 2008 19:00

The three 6200-series switches have been up for over 3 days each now.

 

It appears that there is a problem with SSH packets directed at the switch's management IP that is causing them to reboot. I am not sure of what other circumstances might also be contrinuting, or what combination of time, ssh packets, and other traffic may be required, or if it is just some certain amount of ssh traffic.

 

Once we stopped using SSH to manage and monitor these switches 3+ days ago, they have been stable.

 

These are all running 2.1.0.13 firmware.

 

We are waiting on a final confirmation, and what our next step(s) are from the Dell "Complex Systems Technical Support Analyst" who has been working with us to resolve this issue.

 

--

Bill Arlofski

Reverse Polarity, LLC

5 Posts

October 19th, 2008 15:00

I've also been having a consistent problem with a rebooting 6224.   On occasion we also get a complete freeze of the switch which is only resolved by unplugging it and plugging it back in.

 

We've been in contact with Dell support, and they're mentioned the SSH issue above--however we have SSH completely disabled.   We can usually reproduce the problem by running "ip http server" and then hitting the switch's HTTP interface.   That will usually cause an instant reboot.

 

It's also rebooted by just issuing regular commands on the console, which I find absolutely frustrating.  

 

Anyone else experiencing issues like this with a 6224?  I'm leaning towards a hardware problem as the original poster did, but Dell assures me it's an OS-related problem.   

 

I'm getting near the end of my rope--a switch is the last thing I need to worry about rebooting randomly!

Message Edited by bdowne011 on 10-19-2008 11:24 AM
No Events found!

Top