We operate a NetWorker server in version 7.6.4. on IBM-AIX cluster.
This installation runs on two cluster nodes, both XTEST60-XTEST61 has been updated to NetWorker 18.104.22.168.
After the successful update of the failover nodes, XTEST61 fails to start the NetWorker daemons. All recources of the cluster lie on XTEST61.
The error message reads,
73248 1405510561 5 5 0 1 16056448 0 xtest61p.swd-ag.de nsrd critical NSR 148 Can not start nsrd Because% s (/ nsr) is local, and is as configured as a NetWorker cluster server. Use NetWorker cluster manager to check service status. 1 23 8 / nsr / res
The control of all config files revealed that they are correct, compared against our current productive NetWorker cluster.
The update from 7.6.4 to 22.214.171.124, we have carried out again after the error, the result was again wrong.
We intend to update our productive clusters in the coming days and solicit support in this way.
With best regards,
AIX team, Stadtwerke Duesseldorf AG.
Of course, we have configured the cluster olso the gst.
The miracle to me is, why did it work on XTEST60 the primary cluster node and did not work on the standby node?????
I do no thave experience with update on AIX nodes from NW7 as all my AIX landscape went from TSM to NW8. I do have 2 backup servers running on AIX cluster and no issues there (and number of application clients also in cluster). What I would try is to uninstall everything, remove cluster files by NW and make sure symlink is in correct state. Once you install it again, networker.cluster should be run and make sure that nw_hacmp.lc on both nodes is correct (eg. pointing to correct IP address and shared storage). Then run daemons via /etc/rc.nsr and only client should start (passive node). Then use clRGmove to test failover between nodes. I suspect cluster config file by NW was not updated on passive node and this is why symlink phase failed or it was even not aware of the cluster config.
All new, both nodes new installed, NW 7.6.4, update to 126.96.36.199. Installation without errors. Cluster configured.
All the same like before. NW runs fine on the primary node, starts NOT on the standby node.
I'm more frustrated with level of information you give I assume what doesn't work is failover when you start server, right? Running client only does work? Now, can you confirm that as already asked that nw_hacmp.lc is correct and have correct values? If so, do failover and provide logs. Do also capture output of actions when you call start script to get an idea what it does and where does it break.
What do you mean by cluster config file? /usr/bin/nw_hacmp.lc?
if so it's been running well and it is the same in both cluster nodes.
What is your env? AIX 6.1 TL09 1412 with Virtualized ethernet Adapters?
PowerHA System Mirror 6.1 SP12?
After Fail over the NSRD dies with Error:
nsrd NSR critical 148 Can't start nsrd because %s (/nsr) is local, and NetWorker is configured as a cluster server. Use cluster manager to check NetWorker service status
there is no possibility to find out, why nsrd dies. the Debug mode doesn't bring more information as this primitive error.
the NSR link is correct and shows to shared disk.
Everything looking normal like the Primary Node. which starts without any problem.
I have both PowerHA 6.1 and 7.1 on LPARs. It is not 8.1.1, but 188.8.131.52, but from error that should not matter at all.
Yes, NW cluster config is nw_hacmp.lc and that file gets nuked each time you do update so it is essential to update it after upgrades with valid information (two lines showing server ip and share disk location). Please capture what stop and start script are doing when doing failover to get more information. Before (and after) these actions take note of nsr symlinks (and make sure that /nsr/run on passive node has correct information if client is running only).
the nw_hacmp.lc has the right environment as mentioned. nw_hacmp-lc start/stop script is only setting the environment and starts the nsruexec in /opt/nsr/admin/nsr_envexec reading a file calls lcmap file.
This files are all the same in cluster nodes.
what cluster does is, bring the resource group online. A RG is a App-Server within a Volume Group and Service ip label.
Nothing more. This is the Duty of cluster and it does it well. The VG is online and Service IP is also reachable.
After RG is online cluster exits with error 0 as far as App-Scripts has the same exit Code.