Have a fairly new Dell N3048 switch on our network. Had it for several months in production working just fine. Yesterday the thing randomly died with no external anomalies known. The device was powered off. We have two PSU's in the unit. A tech that investigated said one was on blowing warm air and the other was off. He found that odd as the room is quite chilled and the unit is always cool to the touch and blowing out cold air from the device. The switch is plugged into two separate backups systems.
Any ideas? Running 18.104.22.168
Logs pulled from the device (starts at bottom. sorry for the ordering)
|Info||May 28 19:09:25||CLI_WEB||Configuration file <startup-config> read from flash!|
|Info||May 28 19:09:24||CLI_WEB||Configuration file <startup-config> read from flash!|
|Notice||May 28 19:09:23||IP||Added RPPI routing policy client ospf:0.|
|Notice||May 28 19:09:23||TRAPMGR||Unit 1 is the new stack master, Old stack master unit is 0|
|Info||May 28 19:08:58||General||Application Started (traceroute-0, ID = 12, PID = 1447|
|Notice||May 28 19:08:58||General||Administrative Command:app-start traceroute-0|
|Info||May 28 19:08:57||General||Application Started (ping-0, ID = 11, PID = 1436|
|Notice||May 28 19:08:57||General||Administrative Command:app-start ping-0|
|Info||May 28 19:08:56||General||Application Started (ospf-00, ID = 10, PID = 1364|
|Notice||May 28 19:08:56||General||Administrative Command:app-start ospf-00 0|
|Info||May 28 19:08:56||General||Application Started (vr-agent-0, ID = 9, PID = 1352|
|Notice||May 28 19:08:56||General||Administrative Command:app-start vr-agent-0|
|Info||May 28 19:08:55||VR_AGENT||initialized the clnt addr:/tmp/fpcvragent.00,family:1|
|Info||May 28 19:08:55||UNITMGR||StatusInit: init done|
|Alert||May 28 19:08:54||SIM||Switch was reset due to software initiated exit.|
|Critical||May 28 19:08:54||General||Event(0xaaaaaaaa)|
|Notice||May 28 19:08:54||BSP||BSP initialization complete, starting switch firmware.|
|Info||May 28 19:08:53||DRIVER||Configuring CPUTRANS RX|
|Info||May 28 19:08:53||DRIVER||Configuring CPUTRANS TX|
|Info||May 28 19:08:53||DRIVER||Adding BCM transport pointers|
|Notice||May 28 19:08:48||General||Booting with default SDM template Data Center - IPv4 and IPv6.|
|Info||May 28 19:08:47||General||Application Terminated (user.start, ID = 7, PID = 1142|
|Alert||May 28 19:08:47||General||Error 0 (0x0)|
Forgot to mention that the tech had to remove power from the device and plugged it back in. The device came back up and everything is working properly. But now I'm afraid I'm going to have random non-recoverable failures in the future.
I could not find much info based on the logging messages. Is this a PoE switch? Or is the second power supply just for redundancy? In the Web GUI under System > General > Health, it should display the status of the hardware. Does everything there look ok?
Second PSU just for redundancy, no POE. The system health shows everything as great. I can't find any issues or cause but it definitely crashed into a powered-down state.
Total CPU Utilization 5 Secs ( 18.4377%) 60 Secs ( 12.2220%) 300 Secs ( 11.2987%)
With everything's status showing as good, it may be hard to pinpoint the exact cause. I would continue to monitor. It could be a one off situation. But It also wouldn't hurt to open a support ticket, just to document the occurrence.
Odd..... We have lost connectivity to the management interface on the stack master. I cant even connect to it from putty or a serial connection. It pings and seems to connect fro telnet or serial but no interface screen comes up...... This is getting dire.