Pavithraa

3 Posts

1886

December 2nd, 2014 16:00

Networker server goes into hung state after the upgrade from 7.6.1.3 to 8.1.1.9

Hi,

We recently upgrded the networker server from 7.6.1.3 to 8.1.1.9. After the upgrade server has gon to hung state many times, usually when many jobs are started in the evening and restarting the service is the only solution we have.

Our Networker server is in HPUX 2 node cluster with 2000+ clients.

OS : HP-UX B.11.23 U ia64

Storage nodes : 4 (3 HPUX + 1 Redhat Linux)

Previous version : 7.6.1.3

Current Version : 8.1.1.9

Summary of the issue

Networker server is in HPUX 2 node cluster. On 15th Nov 2014, after the upgrade we did a
failover test from node A to second node A. Issue started after we switched back the package to node1, where it was running before the
upgrade.

We enabled the nsrauth in the networker server
Server was working fine for almost an hour and then went into hung state.
We stopped the services, cleared /nsr/tmp in server and storage nodes and restarted service
and still had issue.
Rebooted the server and also the storage nodes
Cleared the peer information on server and storage nodes and restarted the service on storage nodes and server. After this server worked for 3 days.
During these 3 days, we didn’t use the NMC as we suspect that the issue would have caused from the NMC server. Reference Service Request Management #67222032
Meanwhile there were lot of GSS related error messages found in the daemon.log, EMC suggested to upgrade the networker client version to 8.1.1.9 – SR # 67222032
We upgraded around 2000 clients and made the necessary changes that were suggested by DSE support and rebooted the Networker server and storage nodes on 26th Nov 2014.
After the server came up, we received an error message for a storage node – storage node not ready. This issue was fixed by adding the networker server details in the /etc/hosts file of storage node
Server went into hung state on Friday 26th Nov’ 2014 when the full backups started.
From then server went into hung state multiple times.

Actions performed

auth methods: "0.0.0.0/0,nsrauth/oldauth is enabled
on networker server and storage nodes
98% of the clients are upgraded to version 8.1.1.9
Peer information has been cleared on networker server and storage nodes (multiple times)
Server and storage nodes are rebooted
Ulimit in networker server is changed from 8000 to 32768

NW162230 - NetWorker hangs intermittently after upgrading from 7.6.x to 8.1.1.9 is created to EMC engineering team to check the issue.

Could someone please help me with this issue... Has anyone already come across such issue with version 8.1.1.9?

I was told that 8.1.1.9 has fixed many issues, but it doesn't seem to be true in our case.

Responses(7)

C

crazyrov

4 Operator

•

1.3K Posts

0

December 2nd, 2014 23:00

I have the exact same issue on the same OS for the backups server. I however have only around 650 clients on my affected backup server. I see this mostly during my monthly backup schedules and also sometimes on the weekly schedule. The thing is the server does not actually get hung, the nsrd crashes with the rest of the services still running. I had a case with EMC on this they gave me a patched binary which seemed to be working until it crashed again, but again the crashes are not as frequent as they were before EMC gave me the binary.

I am blaming this on my very old HP-UX server and am planning on getting migrating it to a Linux. things are not as fast as you expect in the service industry especially when there is cost involved so is taking a while for me to reach there.

I hope you backup server is patched with the latest patches or at least with the one's EMC has provided as pre-requisites. Also, the OpenSSL should be of the latest version. Try this and see if this help you.

sreejith_pg

11 Posts

0

December 3rd, 2014 00:00

Whether the Backup Server is going to Hung state or the Backup Jobs are getting Hanged?? We faced same kind of issue after the Networker upgrade to 8.1.1.6. As there was some bug identified in 8.1.1.6, we upgraded to 8.1.1.9 and applied some patches suggested by engineering team and now it is in stable condition.

sreejith_pg

11 Posts

0

December 3rd, 2014 00:00

We have two different data zones. One is having around 450 clients and the other one is having less than 100 clients. Currently all the backups are stable. Earlier we used to get some errors like no tape label found even if the volumes are available in pools. The tape labels will automatically get unlabeled (No data lose, after inventory, it will come up, manual backup will get complete successful, but schedule will fail/ hang).

C

crazyrov

4 Operator

•

1.3K Posts

0

December 3rd, 2014 00:00

@"Sreejith P G" - How many clients do you backup in your datazone ?

ble1

2 Intern

•

14.3K Posts

0

December 3rd, 2014 02:00

I believe for 11v2 there are some (or at least one) HPUX patch required. What has been mentioned for ssl was fixed in 8.1.2.1 (but there is still recommendation to get ssl up to 0.9xy or something like that (I think xz is latest one). I run 11v3 without issue with 8.0.3.9 (well, I did hit the bug with ssids not removed which is fixed in 8.0.4.1) and thought of jumping to 8.1.2.1, but I plan to move this to Linux as well so most likely I won't be updating (due to some dependencies). Do you use dynamic nsrmmds? Do you have core dumps (check /nsr/core)? If yes, for which process?

P

Pavithraa

3 Posts

0

January 19th, 2015 03:00

The issue was fixed after we upgraded the networker version to 8.1.2.3 (root cause unknown)

C

crazyrov

4 Operator

•

1.3K Posts

0

January 19th, 2015 05:00

Mine too. This was a bug in the previous version and was fixed in the 8.1.2.2 release.

View All

No Events found!