NetWorker:網路設定的最佳實務
摘要: 本文旨在為 NetWorker 主機提供理想的標準網路可調式裝置提供簡單的基準。
症狀
- 與網路或主機連線能力相關的錯誤,包括但不限於:
- 備份失敗,似乎已完成實際資料傳輸
- 資源普遍枯竭或通信崩潰
GSS warning Session information (number hex:hex) registered by user for nsrexecd has expired because a NetWorker daemon had not requested it after 120 minutesGSS error Session information (number hex:hex) was requested by nsrmmd but the session has expiredRPC severe Unable to query NSR database for list of configured devices: RPC receive operation failed; peer = ip_addr:port, errno = Connection timed outRPC severe Unable to query NSR database for list of configured devices: RPC send operation failed; peer = ip_addr:port, errno = Broken pipeNSR notice Chunking ssid ssid failed, because saveset was abortedddp_open_file_ext() failed for File: //mtree/vol_dir/nn/nn/long_ssid, Err: 5004-nfs lookup failed (nfs: No such file or directory) ).NSR critical Connectivity check request is failed for: SN_CONN_REPORT_DD type data_domain deviceRPC error RPC client handle: No route to host.RPC error RPC client handle: Connection refused.RPC error Unable to create the connection with 'portmapper' to host 'hostname' with address 'ip_addr' at port number 7938.RPC critical Aborting client connection from ip_addr: Connection timed out.RPC critical Check whether the firewall is blocking the client ports on the host 'hostname'.RPC critical Check whether the client services are running on the host 'hostname'.
原因
NetWorker 應用程式會在進行一般操作時,在本機和遠端主機上建立許多插槽。雖然伺服器和儲存節點通常會創建更多節點,但用戶端配置也會影響作業成功。
基帕利夫斯:NetWorker 叫用程式會建立通訊端以連線至偵聽程式精靈,但閒置的連線可能會因網路裝置回收未使用的資源而中斷。通常,這需要 NetWorker 伺服器和節點依預設啟用保活功能,且用戶端會遇到問題。NetWorker 針對部分 (但不是全部) 二進位檔案有自己的內部 keepalive 處理方式。默認情況下,操作系統還具有應啟用的保持活動狀態。
連接埠可用性:NetWorker 插槽需要短暫的連接埠才能進行通訊,但作業系統預設會限制此範圍—完全擴充以避免限制或中斷通訊。與 nsrauth 啟用(預設),每個插槽至少需要三個埠;失敗會快速重試,將連接埠保留在TIME_WAIT中,直到成功建立連線為止。因此,應提高最大可用埠數,理想情況下降低TIME_WAIT狀態。
其他長時間運行的套接字也可以使用特定的內部軟體變數進行強化,從而實現更高的彈性或改善緩衝。
解析度
以下是操作系統和主機類的常用推薦設置及其實現命令。適用性總是各不相同;那些被認為普遍可取的未註釋,而那些具有更多可變適用性的註釋,但可在需要時使用。這些設定是善意提供一般建議,但在實作前應由作業系統管理員檢閱。伺服器和儲存節點的所有案例中,這些都可視為最佳預設最佳實務。用戶端適用性取決於配置和角色;衝突的伺服器角色可能會覆蓋建議的設置,因此在部署時應優先考慮特定於角色的要求。
Linux:所有適當的設定均應在 /nsr/nsrrc 檔,必須具有全域讀取/執行許可權 (755) 才能在服務啟動時運行。默認標準條目未註釋,非標準或間接選項註釋。變更設定的可用性,使用 # 相關行上的前綴。根據您部署檔案的位置,根據您部署檔案的位置,修剪與 NetWorker 用戶端、節點或伺服器相關的檔案。進行更改后,將需要重新啟動服務。
### GENERAL USAGE AND CONFIRMATION OF ENGAGEMENT # This file will only be read, parsed and executed if permissions are set to 755 EXACTLY. # Note that for Linux / Unix systems, it will only be engaged at service startup time, and will only # affect NetWorker runtime environment, overriding system settings, but leaving them in place for all # other operations. Changes to nsrrc require service restart to be engaged. To confirm parameters # are being engaged, check /nsr/nsrrc.log for datestamps and values. See end of script for details. ### LINUX - For all NetWorker hosts - Clients, Nodes and Server NSR_KEEPALIVE_WAIT=10 export NSR_KEEPALIVE_WAIT NSR_EXEC_MAX_AUTH_THREADS=50 export NSR_EXEC_MAX_AUTH_THREADS # NSR_SOCK_BUF_SIZE=65536 # (262144 for 10 Gb ETH NICs) # export NSR_SOCK_BUF_SIZE # NetWorker internal keepalive settings for some, but not all binaries - 4.5 minutes to ensure keepalives are passed before the increasingly common 5 minute router idle socket kill timer NW_TCP_KEEPIDLE_SECS=270 export NW_TCP_KEEPIDLE_SECS NW_TCP_KEEPINTVL_SECS=75 export NW_TCP_KEEPINTVL_SECS NW_TCP_KEEPCNT=20 export NW_TCP_KEEPCNT # OS-level keepalive values - also set to 4.5 minutes for the same reason sysctl -w "net.ipv4.tcp_keepalive_intvl=75" sysctl -w "net.ipv4.tcp_keepalive_probes=20" sysctl -w "net.ipv4.tcp_keepalive_time=270" # Set kernel limits to ensure core dump generation ulimit -Sn 262144 ulimit -Sc unlimited ### For NetWorker Storage Nodes and Server # Set kernel limits to provide maximum file descriptor availability ulimit -Hn 262144 ulimit -Hc unlimited # Globally disable IPv6, if it is not necessary for operation: # sysctl -w "net.ipv6.conf.all.disable_ipv6=1" # Disable dynamic TCP window scaling - requires compatible equipment in the data path, as well as ECN sysctl -w "net.ipv4.tcp_window_scaling=0" sysctl -w "net.ipv4.tcp_ecn=0" # Raise connection backlog (hash tables) to the maximum value allowed if desired # sysctl -w "net.ipv4.tcp_max_syn_backlog=8192" # sysctl -w "net.core.netdev_max_backlog=8192" # (For 10 Gb Eth use the value = 30000) # Raise memory size available for TCP buffers as needed # sysctl -w "net.core.rmem_default=262144" # sysctl -w "net.core.wmem_default=262144" # sysctl -w "net.core.rmem_max=16777216" # sysctl -w "net.core.wmem_max=16777216" # sysctl -w "net.ipv4.tcp_rmem=8192 524288 16777216" # sysctl -w "net.ipv4.tcp_wmem=8192 524288 16777216" # Increase shared memory pool if required - particularly for immediate mode on Storage Nodes # sysctl -w kernel.shmmax = 2147483648 # - e.g. 2 GB # sysctl -w kernel.shmall = 2147483648 # - e.g. 2 GB # Available TCP client ephemeral port range increase from default: sysctl -w "net.ipv4.ip_local_port_range=10000 64000" # Enable TCP Time Wait Reuse for very high load servers and nodes to increase socket reuse availability sysctl -w "net.ipv4.tcp_tw_recycle=0" sysctl -w "net.ipv4.tcp_tw_reuse=2" # Lower TIME_WAIT delay to close connections more quickly. This may not be necessary in concert with tw_reuse. # sysctl -w "net.ipv4.tcp_fin_timeout=30" # NFS I/O concurrency: sysctl -w "sunrpc.tcp_slot_table_entries=128" sysctl -w "sunrpc.udp_slot_table_entries=128" ### For NetWorker Server only # Settings to increase device resilience for cloud operations or other potentially high-latency devices # NSR_DEVOP_TIMEOUT=3600 # export NSR_DEVOP_TIMEOUT # NSR_DEVOP_POLLING_INTERVAL=600 # export NSR_DEVOP_POLLING_INTERVAL # NSR_DEVOP_INQUIRY_TIMEOUT=900 # export NSR_DEVOP_INQUIRY_TIMEOUT ### Media database tunables # NSR_TCP_READ_LONG_WAIT=Y # export NSR_TCP_READ_LONG_WAIT # NSR_MAX_MEDIADB_RETRY=10 # export NSR_MAX_MEDIADB_RETRY # MMDB_SQLITE_CONFIGURE_MEMORY=1 # export MMDB_SQLITE_CONFIGURE_MEMORY # MMDB_SQLITE_PAGECACHE_SIZE=65536 # export MMDB_SQLITE_PAGECACHE_SIZE # MMDB_SQLITE_PAGE_COUNT=65536 # export MMDB_SQLITE_PAGE_COUNT # MMDB_SQLITE_HEAP_SIZE=1073741824 # export MMDB_SQLITE_HEAP_SIZE # MDB_SQLITE_HEAP_MIN_ALLOC_SIZE=128 # export MDB_SQLITE_HEAP_MIN_ALLOC_SIZE ### NetWorker VMware Protection (NVP) Specific Tunables ## Increase the inventory (nsrvim) default timeout # GST_VBA_TIMEOUT=7200 # export GST_VBA_TIMEOUT # NSR_HYPERVISOR_QUERY_REQUEST_TIMEOUT=3600 # export NSR_HYPERVISOR_QUERY_REQUEST_TIMEOUT ## NW server interval to change VMware inventory (nsrvim) frequency (Default is 15 minutes) ## NOTE only supported in 19.10.0.0 and later # NSRVIM_TIME_INTERVAL=60 # Interval is in minutes and can be set between 15 and 60. # export NSRVIM_TIME_INTERVAL # Confirmation parameters added to end of script - comment or modify as desired. echo "### $(date +%FT%T) - Starting with nsrrc parameters ###" >> /nsr/nsrrc.log sysctl -a >> /nsr/nsrrc.log env >> /nsr/nsrrc.log ulimit -a >> /nsr/nsrrc.log
Windows:自 /nsr/nsrrc Windows 檔案不存在,變更必須使用批次檔案執行,例如 nsrrc.bat 或其他部署方法。此處提供了命令,其中存在命令驅動選項。這些更改是全域的,不需要重複運行。像Linux的 nsrrc 檔,預設的標準條目將取消註釋,並註釋非標準或間接選項。變更設定的可用性,使用 REM 相關行上的前綴。根據您部署檔案的位置,根據您部署檔案的位置,修剪與 NetWorker 用戶端、節點或伺服器相關的檔案。進行更改后,將需要重新啟動服務。
REM ### WINDOWS - For all NetWorker hosts - Clients, Nodes and Server REM # TCP window size tuning - greater throughput / Data Domain REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters /v DefaultSendWindow /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters /v DefaultReceiveWindow /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v GlobalMaxTcpWindowSize /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpWindowSize /t REG_DWORD /d 262144 /f REM # Global keepalive registry settings - 270s to fall below common idle socket timer kills of 300s reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime /t REG_DWORD /d 270000 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveInterval /t REG_DWORD /d 75000 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpMaxDataRetransmissions /t REG_DWORD /d 20 /f REM # Global NetWorker keepalive and connectivity variables setx /m NW_TCP_KEEPIDLE_SECS 270 setx /m NW_TCP_KEEPINTVL_SECS 75 setx /m NW_TCP_KEEPCNT 20 setx /m NSR_KEEPALIVE_WAIT 10 setx /m NSR_EXEC_MAX_AUTH_THREADS 50 REM setx /m NSR_SOCK_BUF_SIZE=65536 # (262144 for 10 Gb Eth NICs) REM ### For NetWorker Storage Nodes and Server REM # Standard TCP features - disable in case of disconnections REM netsh interface tcp set global rss=disabled REM netsh interface tcp set global autotuning=disabled REM netsh interface tcp set global ecncapability=disabled REM netsh interface tcp set global timestamps=default REM # Port range availability for TCP client callers netsh int ipv4 set dynamicport tcp start=10000 num=54000 netsh int ipv4 set dynamicport udp start=10000 num=54000 netsh int ipv6 set dynamicport tcp start=10000 num=54000 netsh int ipv6 set dynamicport udp start=10000 num=54000 REM # Global port maximum (deprecated) and TIME_WAIT window REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v MaxUserPort /t REG_DWORD /d 65535 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpTimedWaitDelay /t REG_DWORD /d 30 /f REM # Disable IPv6 if not required REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters /v DisabledComponents /t REG_DWORD /d 0x000000ff /f REM ### For NetWorker Server only REM # Settings to increase device resilience for cloud operations or other potentially high-latency devices REM setx /m NSR_DEVOP_TIMEOUT 3600 REM setx /m NSR_DEVOP_POLLING_INTERVAL 600 REM setx /m NSR_DEVOP_INQUIRY_TIMEOUT 900 REM ### Settings for media database tuning REM setx /m NSR_TCP_READ_LONG_WAIT Y REM setx /m NSR_MAX_MEDIADB_RETRY 10 REM setx /m MDB_SQLITE_HEAP_MIN_ALLOC_SIZE 128 REM setx /m MMDB_SQLITE_CONFIGURE_MEMORY 1 REM setx /m MMDB_SQLITE_HEAP_SIZE 1073741824 REM setx /m MMDB_SQLITE_PAGE_COUNT 65536 REM setx /m MMDB_SQLITE_PAGECACHE_COUNT 65536 REM setx /m MMDB_SQLITE_TMP path_to_temp_dir REM ### NetWorker VMware Protection (NVP) Specific Tunables REM ## Increase the inventory (nsrvim) default timeout REM setx /m GST_VBA_TIMEOUT 7200 REM setx /m NSR_HYPERVISOR_QUERY_REQUEST_TIMEOUT 3600 REM ## NW server interval to change VMware inventory (nsrvim) frequency (Default is 15 minutes) REM ## NOTE only supported in 19.10.0.0 and later REM setx /m NSRVIM_TIME_INTERVAL 60 REM ## Interval is in minutes and can be set between 15 and 60.