Unsolved

This post is more than 5 years old

1 Rookie

 • 

20 Posts

1054

April 14th, 2008 16:00

Networker used with Veritas Cluster Server

Hello,

I have a customer who is using Veritas Cluster Server (VCS) and Networker is a process that is managed within the VCS application. He is not able to failover to the other node while there is a Networker (save) process running. Would this be considered a Networker issue? What type of logs should I look at to determine the cause or even find a problem.

Thanks,
Danny Villanueva

159 Posts

April 15th, 2008 03:00

Hola Dani,

If there is a save process running, is because you are doing backup for client, which, you want to failover.

Why do you want to failover the client during backup?

Have you tryed kill save process? When this process finished, is the failover sucessfull?

You have to know sometimes, if the behavior of machine is not ok during backup, save process can 'hang', and do not 'die'.

Perhaps, you could to create a script, where you type to kill save process before the failover starts. I am not sure if is possible to know when a machine is going to failover for in this way, launch the script.

Regards.

Inma Bernabe.

1 Rookie

 • 

20 Posts

April 15th, 2008 08:00

Imma,

Your input is notable. I understand what process is running during a backup, but in this case the customer's cluster server needs to failover for reasons outside of Networker. It just happens that Networker is running a backup and is holding up the cluster to failover to the other host. Killing the save process is possible but is this really how a cluster is designed to work? That is the customer's real question. He expects for the save process to continue to run the backup but through the 2nd node after a failover.

Danny

4 Operator

 • 

14.4K Posts

April 15th, 2008 12:00

That's not going to happen (unless you would be lucky to get that working through configured number of retries, but this is very very slim and something not to rely on in this specific case). In theory this is possible, but NW was never coded in that way.

10 Posts

April 16th, 2008 10:00

I'm the customer.
The situation is that a service group within a cluster contains a definition for an application which in this case is legato.
The only agent I've been able obtain for this purpose is a simple bash script which vcs uses to either stop,start,monitor the application.
The monitor portion just does a query to the server using nsrmm.
The online/start or offline/stop runns nsrmm to mount or unmount the legato volume.
I wish I had a better agent and have heard from symantec that one exists but they don't support it so it must come from emc.
I have marked the application resource as not critical, and marked the disk mount as not critical and marked the dg as not critical but yet vcs hangs during a failover waiting for the legato resources which are not freeing up due to a backup that is still running.
Therefore there are two issues - vcs should continue the failover since those resources are marked not critical. 2) The bash script is not returning a successfull return code allowing the failover to continue.
It would appear that if a backup is running sometimes a nsrmm unmount will succeed and other times it will not succeed.
Perhaps someone has modified this script to include stopping a save but I have never found a command line option to stop a save, only character based or java based interactive interfaces to do so and just killing a running process is not a good thing to do, many times the media database and indexes will get broken as a result of doing so and the volume will have to be recycled as a result and all backups are lost at that point.

4 Operator

 • 

14.4K Posts

April 16th, 2008 10:00

I did in the past NW server under VCS and had no issues. I can't give you the script here as it was done for specific customer who paid for that. Nevertheless, the script should take care of following:
- if failing node is alive shut down NW daemons (and on all SN)
- start NW daemons on new node
- check if failed node is alive and start client daemon

I didn't have to use any unmount commands as drives were presented on both nodes in same order and naming. Passive node was always client while active one served as server/storage node.

Top