I've finally got a serial console physically connected to the switch, and below is the prompt showing that it somehow has gone into "debugger" mode:
db{1}>
Running a dmesg shows:
WARNING: 3 errors while detecting hardware; check system log. boot device: wd0 root on md0a dumps on wd0l dump_misc_init: max_paddr = 0x7f800000 WARNING: clock lost 5578 days WARNING: using filesystem time WARNING: CHECK AND RESET THE DATE! NMI ... going to debugger
Above makes it seem that the switch might have hit the networking clock signal bug in the ATOM CPU component (not sure though, since I only see the phrase "clock lost..." and connect it to "clock signal bug" )
"Once the component has failed, the system CPU will stop functioning but traffic may continue to flow. Once encountered it is likely that the unit will not boot, and will not be recoverable. Typically the system or card will stop functioning and will hang or reboot continuously. The issue may not be observed until a reboot or power cycle occurs." <ADMIN NOTE:Broken link has been removed from this post by Dell>
Can above warnings happen without the CPU bug? (everybody I've been talking to have never experienced the bug or heard of anyone who had the bug)
First did a "reboot" command on the debugger command-line interface, this froze the switch instantly.
After that, I did a power cycle (waited a few minutes), but the switch didn't show anything on the serial console, and all ports lights went off. The switch is dead and I believe the cause is the networking clock signal bug in the Atom CPU since all signs of the bug are present.
I had a spare switch which was loaded with the same configuration (configured for a VLT setup). Powered off the failed switch, and powered off the replacement switch, changed over the cables to the replacement switch and turned on the replacement switch, and everything went into a good state again. Uptime on the other switch which has been working all the time is: Up Time : 3 yr, 0 wk, 4 day, 5 hr, 29 min
So I expect this switch will fail at some point Luckily VLT works so no interruption
DELL-Josh Cr
Moderator
•
9.5K Posts
0
August 26th, 2019 09:00
Hi,
If everything is the same it should establish the heartbeat and sync up without a reboot fine. Page 1085 has some failure information https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_networking/esuprt_net_fxd_prt_swtchs/force10-s4048-on_setup-guide8_en-us.pdf
ledj
12 Posts
0
August 30th, 2019 06:00
Thanks for the reply.
I've finally got a serial console physically connected to the switch, and below is the prompt showing that it somehow has gone into "debugger" mode:
db{1}>
Running a dmesg shows:
WARNING: 3 errors while detecting hardware; check system log.
boot device: wd0
root on md0a dumps on wd0l
dump_misc_init: max_paddr = 0x7f800000
WARNING: clock lost 5578 days
WARNING: using filesystem time
WARNING: CHECK AND RESET THE DATE!
NMI ... going to debugger
Above makes it seem that the switch might have hit the networking clock signal bug in the ATOM CPU component (not sure though, since I only see the phrase "clock lost..." and connect it to "clock signal bug" )
"Once the component has failed, the system CPU will stop functioning but traffic may continue to flow. Once encountered it is likely that the unit will not boot, and will not be recoverable. Typically the system or card will stop functioning and will hang or reboot continuously. The issue may not be observed until a reboot or power cycle occurs." <ADMIN NOTE: Broken link has been removed from this post by Dell>
Can above warnings happen without the CPU bug? (everybody I've been talking to have never experienced the bug or heard of anyone who had the bug)
Thanks
DELL-Josh Cr
Moderator
•
9.5K Posts
0
August 30th, 2019 07:00
It could be that bug, can you private message me the service tag?
ledj
12 Posts
0
August 30th, 2019 11:00
Hi Josh
Done. Sent you log also. Would be nice to know if it is that bug.
ledj
12 Posts
0
September 1st, 2019 10:00
An update.
First did a "reboot" command on the debugger command-line interface, this froze the switch instantly.
After that, I did a power cycle (waited a few minutes), but the switch didn't show anything on the serial console, and all ports lights went off. The switch is dead and I believe the cause is the networking clock signal bug in the Atom CPU since all signs of the bug are present.
I had a spare switch which was loaded with the same configuration (configured for a VLT setup). Powered off the failed switch, and powered off the replacement switch, changed over the cables to the replacement switch and turned on the replacement switch, and everything went into a good state again. Uptime on the other switch which has been working all the time is: Up Time : 3 yr, 0 wk, 4 day, 5 hr, 29 min
So I expect this switch will fail at some point Luckily VLT works so no interruption