NetWorker: Server intermitently unresponsive due to uncleared REST API calls
Podsumowanie: The NetWorker 19.13 or 19.14 server (nsrd) process becomes unresponsive intermittently due to REST API requests creating jsvc connections that do not clear
Objawy
- The NetWorker server (
nsrd) process becomes unresponsive. - NetWorker protection operations stall or fail reporting various connection related errors; for example "
RPC connection timed out" or "Connection limit reached" - NetWorker user interfaces stop responding or fail to connect:
- NetWorker Management Console (NMC)
- NetWorker Web User Interface (NWUI)
nsrwatchnsradmin
- The environment is configured with REST API reporting utilities that connect to the NetWorker server to generate reports or ticketing.
- The NetWorker server is 19.13.0.0 -> 19.13.0.3, or 19.14.0.0
Przyczyna
Each NetWorker REST API request performs an authentication check against the internal AUTHC service. The API requests are creating connections to the AUTHC service, but those connections are not being closed afterward.
[root@nsr ~]# netstat -apno | grep jsvc | head tcp6 0 0 :::9090 :::* LISTEN 2406135/jsvc.exec off (0.00/0/0) tcp6 0 0 :::36523 :::* LISTEN 2406135/jsvc.exec off (0.00/0/0) tcp6 0 0 ::1:60705 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6550.56/0/0) tcp6 0 0 ::1:45705 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6407.22/0/0) tcp6 0 0 ::1:36707 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6825.20/0/0) tcp6 0 0 ::1:57761 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6731.29/0/0) tcp6 0 0 ::1:51473 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6242.11/0/0) tcp6 0 0 ::1:58453 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6446.28/0/0) tcp6 0 0 ::1:53133 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6689.33/0/0) tcp6 0 0 ::1:49749 ::1:8779 ESTABLISHED 2406135/jsvc.exec keepalive (6618.93/0/0)
[root@nsr ~]# netstat -apno | grep jsvc | wc -l 4050
In this example, there are 4,050 connections associated with the NetWorker AUTHC service (jsvc).
This is an exaggerated example from a lab environment where the REST API was queried once per second. In a production environment, the issue may become apparent with a much smaller number of AUTHC connections because other NetWorker operations, devices, and protection workflows also consume available connections and system resources.
Under normal conditions, two jsvc processes are present while the NetWorker server is running. Additional jsvc-related connections may appear temporarily depending on the operations being performed, but they should close when no longer needed. It is not expected to see dozens or thousands of jsvc connections remaining in the keepalive state for extended periods.
Rozwiązanie
Code fix NETWORKER-127181 is expected in the following NetWorker releases:
- 19.13.0.4
- 19.14.0.1
Workaround:
Restart NetWorker server services:
- Linux:
systemctl restart networker - Windows (PowerShell syntax):
net stop nsrd ; net start nsrd
Dodatkowe informacje
Ensure OS TCP and kernel best practices are in place: NetWorker: Best practices for networking configuration