Unsolved

This post is more than 5 years old

539

June 13th, 2008 02:00

Autostart 5.2 Agent restarts

Hi

I have a customer that has been running 5.2 for over a year and all was OK up to a few weeks ago when the Agent started failing and restarting. I have seen this error before and the simplest way was to re-install autostart on both nodes.

We are running windows 2003 server SP1 and only use auto start to mirror disk between servers.

Can you advise as to what would cause the agents to restart, as re-installing in a production environment is not always possible

Many thanks in advance

157 Posts

June 13th, 2008 02:00

It is difficult to pinpoint the exact cause without the agent log. Agent restart is sometimes expected way for the agent to recover fromt the certain problems, but this is something that should not be happening often.
It is almost always related to either communication problem or the cluster database corruption. It usually helps to change the heartbeat communication from broadcast to point to point (it can be done in node properties). Also, make sure that no third party application is trying to access the files in the Autostart directory (like virus scanner or backup software).
Is the agent restarting continuously or just randomly?

262 Posts

June 13th, 2008 02:00

Which symptom is it?

Does the Agent service stop?
The event of Service control manager is sure to occur if service has terminated abnormally.
The problem might occur in the composition.

Has the heart beat of Agent failed?

Yoshinobu Ito

June 13th, 2008 06:00

It seems to re-start about every 10 minutes, we will look at network reliabilty and check to verification setting

Thanks for your help

157 Posts

June 13th, 2008 06:00

Agent restarts in regular intervals like the one you describe can be almost always solved by switching heartbeat communication from broadcast to point-to-point. Let us know if it worked.

5 Posts

June 18th, 2008 09:00

I work with swiftalliance and what I can tell you is that the node communication has been switched from multicast to point-to-point.

Now previously, the heartbeat setting was point-to-point, but the IP address(es) used for both nodes were IP addresses that no longer existed (and had not been used in at least 2 years), so we are unsure as to how no errors or warnings were no reported. It was because of this the setting was changed to the 'default' multicast. It should be noted that while AutoStart was configured during this point, the AutoStart services were stable (they only started playing up a few hours after the data sources had finished their initial sync.

As it stands now, the heartbeat IP address has been set to the IP address of the opposite node in the cluster (ie, on node 1, the point-to-point IP address is the IP address for node 2 and vice versa). We are not aware of any hard and fast rules governing these IP addresses so we figured using the physical IP address of a node in the cluster should suffice (since it should be always be up and running anyway).

0 events found

No Events found!

Top