NetWorker - RabbitMQ crash during startup
Summary: NetWorker start but was down after a few second.
Symptoms
Customer found that NMC is hanging and all workflows are in scheduled mode.
Restart of NetWorker server and reboot of OS doesn't solve the issue.
Not all nsr process are running (nsrexecd is up but no nsrmmd)
Cause
In daemon.raw log, we see a lot of RabbitMQ crash before the nsr shutdown of all nsrmmd.
This the crash:
0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: RabbitMQ 3.6.14. Copyright (C) 2007-2017 Pivotal Software, Inc. 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ## ## Licensed under the MPL. See http://www.rabbitmq.com/ 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ## ## 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ########## Logs: /opt/nsr/rabbitmq-server-3.6.14/var/log/rabbitmq/rabbit@NWServLnx.log 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ###### ## /opt/nsr/rabbitmq-server-3.6.14/var/log/rabbitmq/rabbit@NWServLnx-sasl.log 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ########## 0 04/07/2020 09:17:57 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: Starting broker... 0 04/07/2020 09:18:00 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{error,{{shutdown,{failed_to_start_child,rabbit_memory_monitor,{badarg,[{lists,member,[disk,{error,bad_module}],[]}, {rabbit_memory_monitor,init,1,[{file,\"src/rabbit_memory_monitor.erl\"},{line,111}]},{gen_server2,init_it,6,[{file,\"src/gen_server2.erl\"}, {line,546}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,247}]}]}}}, {child,undefined,rabbit_memory_monitor_sup,{rabbit_restartable_sup,start_link,[rabbit_memory_monitor_sup,{rabbit_memory_monitor,start_link,[]},false]},transient,infinity,supervisor,[rabbit_restartable_sup]}}}}}}}"}^M 0 04/07/2020 09:18:00 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{error,{{shutdown,{failed_to_start_child,rabbit_memory_monitor,{badarg^M 0 04/07/2020 09:18:00 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: ^M 0 04/07/2020 09:18:00 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: Crash dump is being written to: erl_crash.dump... 0 04/07/2020 09:18:02 AM 1 5 0 557012736 2127 0 NWServLnx nsrctld NSR notice rmq: done^M
We have same signature than KB 534881 and some case where / was full.
In my case, NW was installed in /opt and this mount point was not full with 42MB of free space:
Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.9G 3.6G 5.8G 39% / devtmpfs 16G 8.0K 16G 1% /dev tmpfs 16G 0 16G 0% /dev/shm /dev/sdc1 10G 9.9G 42M 96% /opt << NW was installed here
In fact, RabbitMQ required a minimum of 50MB for free space.
See here: https://community.pivotal.io/s/article/Unable-to-Start-rabbitmqserver-Node-after-Upgrading-RabbitMQ-for-Pivotal-Cloud-Foundry