NetWorker: Troubleshooting Guide for Red Hat Cluster Service Issue

Summary: This article provides an overview of how to approach NetWorker service startup issues for NetWorker servers deployed on Red Hat pacemaker (pcs) clusters. This article is appropriate for NetWorker backup administrators and NetWorker support to aid in troubleshooting these issues. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

NetWorker servers can be deployed in a cluster failover configuration on Red Hat nodes using pacemaker (pcs) services. NetWorker is installed on multiple nodes. The server databases are on shared storage, passed between nodes based on the active node in the pacemaker configuration. The NetWorker server uses a shared cluster name and IP address, ensuring consistent naming and addressing regardless of the hosting node. See the NetWorker Cluster Integration Guide for details on how to set up NetWorker in a cluster. This guide is available on the Dell Support Product Page.

Cluster Topology:

This article uses an example cluster with the following configuration:

NetWorker Cluster Topology

Hostname	IP Address	Function
lnx-node1.amer.lan	192.168.9.108	Physical Node 1
lnx-node2.amer.lan	192.168.9.109	Physical Node 2
lnx-nwcluster.amer.lan	192.168.9.110	Logical Name used by NetWorker

The file system on the nodes manages NetWorker using symbolic links.

Active Node:

An active node where the NetWorker server is started symbolically links /nsr to the shared storage location:

root@lnx-node1:~# ls -l / | grep nsr
lrwxrwxrwx.   1 root root     14 Oct  5 10:49 nsr -> /nsr_share/nsr
drwxr-xr-x.  11 root root    116 Aug 31 17:20 nsr.NetWorker.local
drwxr-xr-x.   3 root root     17 Aug 31 17:23 nsr_share

Passive Node:

A "passive" node symbolically links /nsr to /nsr.NetWorker.local:

root@lnx-node2:~# ls -l / | grep nsr
lrwxrwxrwx.   1 root root     20 Oct  3 17:08 nsr -> /nsr.NetWorker.local
drwxr-xr-x.  11 root root    116 Aug 31 17:19 nsr.NetWorker.local
drwxr-xr-x.   2 root root      6 Aug 31 17:18 nsr_share

When a node is in a passive state, the nsrexecd (NetWorker client) software is running using /nsr.NetWorker.local. Each physical node has its own client resource using the physical node's Domain Name System (DNS) resolvable name and IP address. The NetWorker server only runs using the shared storage (/nsr_share) and uses the shared IP address and hostname. This can only be active on one node at a time.

The following pacemaker (pcs) commands are used to get an overview of the pacemaker configuration and status:

Cluster configuration:

pcs status

Example:

root@lnx-node1:~# pcs status 
Cluster name: rhelclus 
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-10-05 10:59:19 -04:00) 
Cluster Summary: 
  * Stack: corosync 
  * Current DC: lnx-node1.amer.lan (version 2.1.5-9.3.el8_8-a3f44794f94) - partition with quorum 
  * Last updated: Thu Oct 5 10:59:20 2023 
  * Last change: Thu Oct 5 10:59:13 2023 by root via cibadmin on lnx-node1.amer.lan 
  * 2 nodes configured 
  * 3 resource instances configured 

Node List: 
  * Online: [ lnx-node1.amer.lan lnx-node2.amer.lan ] 

Full List of Resources: 
  * Resource Group: NW_group: 
    * fs (ocf::heartbeat:Filesystem): Started lnx-node1.amer.lan 
    * ip (ocf::heartbeat:IPaddr): Started lnx-node1.amer.lan 
    * nws (ocf::EMC_NetWorker:Server): Started lnx-node1.amer.lan 

Daemon Status: 
  corosync: active/enabled 
  pacemaker: active/enabled 
  pcsd: active/enabled

From the above output, we can determine how many nodes are in the cluster and if any are offline or in standby status. The output also shows which node is hosting the shared file system (fs), cluster resource IP address (ip), and the NetWorker services (nws). The resource names used here are the defaults used in the NetWorker Cluster Integration Guide; however, it is possible that different names are used. If you are using different names, make note of the resource names and replace as needed when following the instructions in this article.

Pacemaker resource configuration:

pcs resource config

Example:

root@lnx-node1:~# pcs resource config 
Group: NW_group 
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
    Attributes: fs-instance_attributes 
      device=/dev/sdb1 
      directory=/nsr_share 
      fstype=xfs 
    Operations: 
      monitor: fs-monitor-interval-20 
        interval=20 
        timeout=300 
      start: fs-start-interval-0s 
        interval=0s 
        timeout=60s 
      stop: fs-stop-interval-0s interval=0s timeout=60s 
  Resource: ip (class=ocf provider=heartbeat type=IPaddr) 
    Attributes: ip-instance_attributes 
      cidr_netmask=24 
      ip=192.1xx.9.1x0 
      nic=ens192 
    Operations: 
      monitor: ip-monitor-interval-15 
        interval=15 
        timeout=120 
      start: ip-start-interval-0s 
        interval=0s 
        timeout=20s 
      stop: ip-stop-interval-0s 
        interval=0s 
        timeout=20s 
  Resource: nws (class=ocf provider=EMC_NetWorker type=Server) 
    Meta Attributes: nws-meta_attributes 
      is-managed=true 
    Operations: 
      meta-data: nws-meta-data-interval-0 
        interval=0 
        timeout=10 
      migrate_from: nws-migrate_from-interval-0 
        interval=0 
        timeout=120
      migrate_to: nws-migrate_to-interval-0 
        interval=0 
        timeout=60 
      monitor: nws-monitor-interval-100 
        interval=100 
        timeout=1200 
      start: nws-start-interval-0 
        interval=0 
        timeout=600 
      stop: nws-stop-interval-0 
        interval=0 
        timeout=600 
      validate-all: nws-validate-all-interval-0 
        interval=0 
        timeout=10

The above command details each pcs resources configuration. Important things to make note of during the initial overview:

FS resource "device=": This is the device used as the mountpoint for the shared storage on the node file system. This device must be the same on each node. This is discussed later in this KB.
FS resource "directory=": This is the directory which the shared NetWorker storage uses. The directory should be associated as the mountpoint for the "device=" field. This is discussed later in this KB.
IP resource "ip=": This is the IP address which is associated with the logical (shared) hostname used by the NetWorker server. This IP address is hosted on the active node.

Pacemaker visibility of the shared address and storage:

lcmap

Example:

root@lnx-node1:~# lcmap
type: NSR_CLU_TYPE;
clu_type: NSR_LC_TYPE;
interface version: 1.0;

type: NSR_CLU_VIRTHOST;
hostname: 192.168.9.110;
local: TRUE;
owned paths: /nsr_share;

clu_nodes: lnx-node1.amer.lan lnx-node2.amer.lan;

NOTE: The hostname should return the IP address matched from the pcs resource config "ip=" field. The owned paths should match the pcs resource config "directory=" field. In some instances, when a startup issue is observed, the lcmap command does not return the hostname, local, or owned paths fields; this is indicative of an issue.

Initial Diagnosis:

If NetWorker services fail to start check the pcs resource status to see which resource is failing:

pcs status

Example:

root@lnx-node1:~# pcs status 
... 
... 
Node List: 
  * Online: [ lnx-node1.amer.lan lnx-node2.amer.lan ] 

Full List of Resources: 
  * Resource Group: NW_group: 
    * fs    (ocf::heartbeat:Filesystem):   Started lnx-node1.amer.lan 
    * ip    (ocf::heartbeat:IPaddr):       Started lnx-node1.amer.lan 
    * nws   (ocf::EMC_NetWorker:Server):   Started lnx-node1.amer.lan 

Daemon Status: 
  corosync: active/enabled 
  pacemaker: active/enabled 
  pcsd: active/enabled

If a failure is observed, there is a general failure error returned. The failed resources show as FAILED.

FS (Filesystem): If the Filesystem is in a failed state, see below section on Filesystem Failures.
IP (IPaddr): If the IPaddr is in a failed state, see below section on IPaddr Failures.
NWS (Server): If the NetWorker server is in a failed state, perform the following:

Review the NetWorker server's daemon.raw for any failure messages which appear during startup. The server's /nsr_share/nsr/daemon.raw is located in the shared storage path. The physical nodes client daemon is in the /nsr.NetWorker.local/logs/daemon.raw. See Dell article NetWorker: How to use nsr_render_log
If default logging is not sufficient, enable debug by the following:
1. Attempt to restart the "Server" resource:

pcs resource cleanup nws

Use the dbgcommand to enable debug on the nsrd process:

dbgcommand -n nsrd Debug=#

Set a debug level using numbers 1 to 9. Monitor the daemon.raw for any additional messages which may direct to an issue.

Review the /var/log/pcsd/pcsd.log for any errors.
Review the /var/log/pacemaker/pacemaker.log for any errors.
Review the /var/log/messages file for any errors.

NOTE: When reviewing the pcsd, pacemaker, and messages logs look for messages which were logged during the same timestamps as the NetWorker services attempted to start. Review for any errors or failures which coincide with the service startup failure.

Filesystem Failures:

Review the pacemaker resources:

pcs resource

Review the pacemaker resource configuration for the Filesystem resource:

pcs resource fs

Example:

Make note of the device path, directory path, and fstype.

root@lnx-node1:~# pcs resource
  * Resource Group: NW_group:
    * fs        (ocf::heartbeat:Filesystem):     Started lnx-node1.amer.lan
    * ip        (ocf::heartbeat:IPaddr):         Started lnx-node1.amer.lan
    * nws       (ocf::EMC_NetWorker:Server):     Started lnx-node1.amer.lan
root@lnx-node1:~# pcs resource config fs
Resource: fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: fs-instance_attributes
    device=/dev/sdb1
    directory=/nsr_share
    fstype=xfs
  Operations:
    monitor: fs-monitor-interval-20
      interval=20
      timeout=300
    start: fs-start-interval-0s
      interval=0s
      timeout=60s
    stop: fs-stop-interval-0s
      interval=0s
      timeout=60s

Confirm whether the device is mounted on the FS:

df -h

Example:

root@lnx-node1:~# df -h | grep /nsr_share /dev/sdb1                                     94G  1.5G   92G   2% /nsr_share

Confirm if the mountpoint is configured correctly; associating the device with the path:

lsblk

Example:

root@lnx-node1:~# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0   40G  0 disk
├─sda1          8:1    0  600M  0 part /boot/efi
├─sda2          8:2    0    1G  0 part /boot
└─sda3          8:3    0 38.4G  0 part
  ├─rhel-root 253:0    0 34.4G  0 lvm  /
  └─rhel-swap 253:1    0    4G  0 lvm  [SWAP]
sdb             8:16   0  100G  0 disk
└─sdb1          8:17   0 93.1G  0 part /nsr_share
sr0            11:0    1 1024M  0 rom

Confirm that the file system used by the device is correct:

blkid

Example:

root@lnx-node1:~# blkid 
/dev/mapper/rhel-root: UUID="7cf2f957-18d8-45b8-bf8f-6361aadc3517" BLOCK_SIZE="512" TYPE="xfs" 
/dev/sda3: UUID="QpZ2hK-OuE2-igN0-Ryba-EwMN-uxq1-LE48hD" TYPE="LVM2_member" PARTUUID="1193db91-4b63-4b33-a4d4-03a22317e064" 
/dev/sda1: UUID="F243-AD41" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6c81bd63-0249-4bdf-afdb-cdde72034162" 
/dev/sda2: UUID="7677ad6b-8191-4a45-8a8a-16cf7d00d72c" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="57481b7a-83ec-4cd8-bf2d-bca09ac27040" 
/dev/sdb1: UUID="600bca60-dd5d-4162-bf77-0537daa3b1e5" BLOCK_SIZE="512" TYPE="xfs" PARTLABEL="networker" PARTUUID="769aaac2-764b-431d-be21-3b5753d6a5d3" 
/dev/mapper/rhel-swap: UUID="537962b6-07d4-4a40-9687-deab2e488936" TYPE="swap"

If the fs (Filesystem) resource is failing to start. This is indicative of an issue outside of NetWorker. The system administrator should review the cluster's file system configuration and confirm no issues with the shared storage used by pacemaker. Review additional system logs regarding any failures with the system or its devices:

/var/log/pcsd/pcsd.log
/var/log/pacemaker/pacemaker.log
/var/log/messages

IPaddr Failures:

Review the pacemaker resources:

pcs resource

Review the pacemaker resource configuration for the Filesystem resource:

pcs resource config ip

Example:

Make note of the IP address and Network Interface Card (NIC).

root@lnx-node1:~# pcs resource
  * Resource Group: NW_group:
    * fs (ocf::heartbeat:Filesystem): Started lnx-node1.amer.lan
    * ip (ocf::heartbeat:IPaddr): Started lnx-node1.amer.lan
    * nws (ocf::EMC_NetWorker:Server): Started lnx-node1.amer.lan
root@lnx-node1:~# pcs resource config ip
Resource: ip (class=ocf provider=heartbeat type=IPaddr)
  Attributes: ip-instance_attributes
    cidr_netmask=24
    ip=192.1xx.9.1x0
    nic=ens192
  Operations:
    monitor: ip-monitor-interval-15
      interval=15
      timeout=120
    start: ip-start-interval-0s
      interval=0s
      timeout=20s stop:
    ip-stop-interval-0s
      interval=0s
      timeout=20s

Confirm if the NIC is available on the system:

ifconfig -a

Example:

root@lnx-node1:~# ifconfig -a 
ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.1xx.9.1x8 netmask 255.255.255.0 broadcast 192.1xx.9.255
        inet6 fe80::250:56ff:fea5:48e1 prefixlen 64 scopeid 0x20<link>
        ether 00:50:56:a5:48:e1 txqueuelen 1000 (Ethernet)
        RX packets 953865 bytes 349705527 (333.5 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1190983 bytes 179749786 (171.4 MiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0 
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 129798 bytes 13274289 (12.6 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0 
        TX packets 129798 bytes 13274289 (12.6 MiB) 
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The IP address shown with ifconfig matches the physical node name; however, the clustered IP is reachable through this NIC when the node is active. Ensure that both nodes are configured to use the same NIC names.

Does the IP address resolve to the correct (logical) hostname used by the NetWorker server?

nslookup ip 

nslookup logical_name_FQDN 

nslookup logical_name_short

Example:

root@lnx-node1:~# nslookup 192.1xx.9.1x0 
110.9.1xx.1x2.in-addr.arpa name = lnx-nwcluster.amer.lan. 

root@lnx-node1:~# nslookup lnx-nwcluster.amer.lan. 
Server: 192.1xx.9.1x0 
Address: 192.1xx.9.100#53 

Name: lnx-nwcluster.amer.lan 
Address: 192.1xx.9.1x0 

root@lnx-node1:~# nslookup lnx-nwcluster 
Server: 192.1xx.9.1x0 
Address: 192.1xx.9.100#53 

Name: lnx-nwcluster.amer.lan 
Address: 192.1xx.9.1x0

It is also recommended to perform the same steps against the physical node's IP address, FQDN, and shortname. See Dell article NetWorker: Name Resolution Troubleshooting Best Practices.

Can you reach the cluster IP address using ping?

ping -c 4 ip

Example:

root@lnx-node1:~# ping -c 4 192.1xx8.9.1x0 
PING 192.1xx8.9.1x0 (192.1xx.9.1x0) 56(84) bytes of data. 
64 bytes from 192.1xx.9.1x0: icmp_seq=1 ttl=64 time=0.051 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=2 ttl=64 time=0.043 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=3 ttl=64 time=0.033 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=4 ttl=64 time=0.034 ms 

--- 192.1xx.9.1x0 ping statistics --- 4 packets transmitted, 
4 received, 0% packet loss, time 3108ms rtt min/avg/max/mdev = 0.033/0.040/0.051/0.008 ms

If the IP (IPaddr) resource is failing to start. This is indicative of an issue outside of NetWorker. The cluster's system administrator and network administrator should be engaged to review the cluster's network configuration and confirm that no issues are observed. Review additional system logs regarding any failures with the system or its devices:

/var/log/pcsd/pcsd.log
/var/log/pacemaker/pacemaker.log
/var/log/messages

Other PCS Commands:

Operation	Command
Pacemaker or `pcs` version:	`pcs --version`
Pacemaker Overview	`pcs status`
Pacemaker resource overview	`pcs resource`
Determine path-ownership in a cluster.	`lcmap`
Enable (start) resource.	`pcs resource enable resource_name`
Start `pcs` resource with debug.	`pcs resource debug-start resource_name`
Review pcs resource configuration settings	`pcs resource config resource_name`
Disable (stop) resource:	`pcs resource disable resource_name`
Restart failed resource.	`pcs resource cleanup resource_name`
Stop pacemaker on node.	`pcs stop cluster [--force]`
Start pacemaker	`pcs cluster start [--all]`
Put the node in standby.	`pcs node standby node_name`
Bring the node out of standby.	`pcs node unstandby node_name`

Important Logs and Files:

Path	Purpose	Supplemental Commands
`/var/log/messages`	Contains global system messages regarding system resources and services.	`grep 'pacemaker.*\(error\\|warning\)' /var/log/messages`
`/var/log/pacemaker/pacemaker.log`	Default pacemaker information logging for pacemaker resources and functions.	N/A
`/var/log/pcsd/pcsd.log`	Default pacemaker service/daemon (`pcsd`) log.	N/A
`/var/log/cluster/corosync.log`	Default pacemaker node communication log.	N/A
`/usr/sbin/nw_hae.log`	NetWorker (`nws`) resource start log as defined in `/usr/lib/ocf/resource.d/EMC_NetWorker/Server`	N/A
`/usr/lib/ocf/resource.d/EMC_NetWorker/Server`	NetWorker pacemaker configuration file. This is what operations are performed/managed by pcs.	N/A

Affected Products

NetWorker

Products

NetWorker Family, NetWorker Series

Article Number: 000218281

Article Type: How To

Last Modified: 22 Oct 2025

Version: 6

Check if your device is covered by Support Services.

NetWorker: Troubleshooting Guide for Red Hat Cluster Service Issue

Summary: This article provides an overview of how to approach NetWorker service startup issues for NetWorker servers deployed on Red Hat pacemaker (pcs) clusters. This article is appropriate for NetWorker backup administrators and NetWorker support to aid in troubleshooting these issues. ...

Instructions

Cluster Topology:

This article uses an example cluster with the following configuration:

NetWorker Cluster Topology

The file system on the nodes manages NetWorker using symbolic links.

Active Node:

Passive Node:

Cluster configuration:

Initial Diagnosis:

Filesystem Failures:

IPaddr Failures:

Other PCS Commands:

Important Logs and Files:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

NetWorker: Troubleshooting Guide for Red Hat Cluster Service Issue

Detailed Article

Instructions

Affected Products

Instructions

Cluster Topology:

This article uses an example cluster with the following configuration: NetWorker Cluster Topology

The file system on the nodes manages NetWorker using symbolic links.Active Node:

Passive Node:

Cluster configuration:

Initial Diagnosis:

Filesystem Failures:

IPaddr Failures:

Other PCS Commands:

Important Logs and Files:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

This article uses an example cluster with the following configuration:

NetWorker Cluster Topology

The file system on the nodes manages NetWorker using symbolic links.

Active Node: