Autostart 5.2 Agent restarts

Question

Hi

I have a customer that has been running 5.2 for over a year and all was OK up to a few weeks ago when the Agent started failing and restarting. I have seen this error before and the simplest way was to re-install autostart on both nodes.

We are running windows 2003 server SP1 and only use auto start to mirror disk between servers.

Can you advise as to what would cause the agents to restart, as re-installing in a production environment is not always possible

Many thanks in advance

tribicic · Answer

It is difficult to pinpoint the exact cause without the agent log. Agent restart is sometimes expected way for the agent to recover fromt the certain problems, but this is something that should not be happening often.
It is almost always related to either communication problem or the cluster database corruption. It usually helps to change the heartbeat communication from broadcast to point to point (it can be done in node properties). Also, make sure that no third party application is trying to access the files in the Autostart directory (like virus scanner or backup software).
Is the agent restarting continuously or just randomly?

yito1 · Answer

Which symptom is it?Does the Agent service stop?The event of Service control manager is sure to occur if service has terminated abnormally. The problem might occur in the composition.Has the heart beat of Agent failed?Yoshinobu Ito

swiftalliance · Answer

It seems to re-start about every 10 minutes, we will look at network reliabilty and check to verification settingThanks for your help

tribicic · Answer

Agent restarts in regular intervals like the one you describe can be almost always solved by switching heartbeat communication from broadcast to point-to-point. Let us know if it worked.

smaf · Answer

I work with swiftalliance and what I can tell you is that the node communication has been switched from multicast to point-to-point.

Now previously, the heartbeat setting was point-to-point, but the IP address(es) used for both nodes were IP addresses that no longer existed (and had not been used in at least 2 years), so we are unsure as to how no errors or warnings were no reported. It was because of this the setting was changed to the 'default' multicast. It should be noted that while AutoStart was configured during this point, the AutoStart services were stable (they only started playing up a few hours after the data sources had finished their initial sync.

As it stands now, the heartbeat IP address has been set to the IP address of the opposite node in the cluster (ie, on node 1, the point-to-point IP address is the IP address for node 2 and vice versa). We are not aware of any hard and fast rules governing these IP addresses so we figured using the physical IP address of a node in the cluster should suffice (since it should be always be up and running anyway).

Replication Manager

Autostart 5.2 Agent restarts

Was this post helpful?