Unsolved

This post is more than 5 years old

930

September 17th, 2009 07:00

Lost Hearbeat - AutoStart 5.3

Hi People!
I'm using AutoStart 5.3, and every day server1 isn't lost the heartbeat connection with server2, with this lose connection all services and data are lost.

What can cause this problem?

Thanks for the help.

Hugs,
Rogerio

September 17th, 2009 09:00

Hi !
The network is ok.


Log:

Info Fri Sep 04 07:19:18 GMT-03:00 2009 Connecting to server1
Info Fri Sep 04 07:19:20 GMT-03:00 2009 Connection established to server1
Info Fri Sep 04 07:19:20 GMT-03:00 2009 Installed Exchange200320.jar
Info Fri Sep 04 07:19:20 GMT-03:00 2009 Installed oraclewindows31.jar
Info Fri Sep 04 07:19:20 GMT-03:00 2009 Installed PrintServices11.jar
Info Fri Sep 04 07:19:20 GMT-03:00 2009 Installed sql200510.jar
Error Sat Sep 05 04:35:24 GMT-03:00 2009 Lost connection to server1
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Connecting to server2
Warning Sat Sep 05 04:35:27 GMT-03:00 2009 Primary agent "server1" not found : Unable to connect to the agent.
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Connecting to server2
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Connection established to server2
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Installed Exchange200320.jar
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Installed oraclewindows31.jar
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Installed PrintServices11.jar
Info Sat Sep 05 04:35:27 GMT-03:00 2009 Installed sql200510.jar
Warning Sat Sep 05 04:35:39 GMT-03:00 2009 ID00004744 Node server2 has stopped receiving heartbeats from Primary node server1 1/22. Declaring node as unresponsive.
Info Sat Sep 05 04:35:51 GMT-03:00 2009 ID00001155 Agent on server1 has started.
Info Sat Sep 05 04:35:58 GMT-03:00 2009 ID00001600 Verify Data Source State for Dados, VolumeType: AAM_Mirror on node server1, State = DETACHED
Info Sat Sep 05 04:36:02 GMT-03:00 2009 ID00004708 Node server1 has started receiving heartbeats from node server2.
Info Sat Sep 05 04:36:02 GMT-03:00 2009 ID00001350 Node server1 is running.
Info Sat Sep 05 04:36:02 GMT-03:00 2009 ID00004708 Node server2 has started receiving heartbeats from node server1.
Info Sat Sep 05 04:36:02 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Info Sun Sep 06 21:37:05 GMT-03:00 2009 ID00001567 Node server2 ftStateMon initialized.
Info Sun Sep 06 21:39:40 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Info Sun Sep 06 21:44:13 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Info Sun Sep 06 21:47:06 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Info Sun Sep 06 21:54:30 GMT-03:00 2009 ID00001567 Node server2 ftStateMon initialized.
Info Sun Sep 06 21:56:13 GMT-03:00 2009 ID00001567 Node server2 ftStateMon initialized.
Error Tue Sep 08 09:07:39 GMT-03:00 2009 Lost connection to server2
Info Tue Sep 08 09:07:40 GMT-03:00 2009 Connecting to server1
Info Tue Sep 08 09:07:41 GMT-03:00 2009 Connection established to server1
Info Tue Sep 08 09:07:42 GMT-03:00 2009 Installed Exchange200320.jar
Info Tue Sep 08 09:07:42 GMT-03:00 2009 Installed oraclewindows31.jar
Info Tue Sep 08 09:07:42 GMT-03:00 2009 Installed PrintServices11.jar
Info Tue Sep 08 09:07:42 GMT-03:00 2009 Installed sql200510.jar
Error Tue Sep 08 09:08:51 GMT-03:00 2009 Lost connection to server1
Info Tue Sep 08 09:08:51 GMT-03:00 2009 Connecting to server1
Warning Tue Sep 08 09:08:52 GMT-03:00 2009 Primary agent "server1" not found : Unable to connect to the agent.
Info Tue Sep 08 09:08:52 GMT-03:00 2009 Connecting to server2
Info Tue Sep 08 09:17:04 GMT-03:00 2009 Connection established to server2
Info Tue Sep 08 09:17:04 GMT-03:00 2009 Installed Exchange200320.jar
Info Tue Sep 08 09:17:04 GMT-03:00 2009 Installed oraclewindows31.jar
Info Tue Sep 08 09:17:04 GMT-03:00 2009 Installed PrintServices11.jar
Info Tue Sep 08 09:17:04 GMT-03:00 2009 Installed sql200510.jar
Warning Tue Sep 08 09:17:17 GMT-03:00 2009 ID00004744 Node server2 has stopped receiving heartbeats from Primary node server1 1/23. Declaring node as unresponsive.
Info Tue Sep 08 09:17:28 GMT-03:00 2009 ID00001155 Agent on server1 has started.
Info Tue Sep 08 09:17:34 GMT-03:00 2009 ID00001600 Verify Data Source State for Dados, VolumeType: AAM_Mirror on node server1, State = DETACHED
Info Tue Sep 08 09:17:38 GMT-03:00 2009 ID00001350 Node server1 is running.
Info Tue Sep 08 09:17:38 GMT-03:00 2009 ID00004708 Node server2 has started receiving heartbeats from node server1.
Info Tue Sep 08 09:17:38 GMT-03:00 2009 ID00004708 Node server1 has started receiving heartbeats from node server2.
Info Tue Sep 08 09:17:38 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Error Wed Sep 09 13:43:36 GMT-03:00 2009 Lost connection to server2
Info Wed Sep 09 13:43:36 GMT-03:00 2009 Connecting to server1
Info Wed Sep 09 13:43:37 GMT-03:00 2009 Connection established to server1
Info Wed Sep 09 13:43:37 GMT-03:00 2009 Installed Exchange200320.jar
Info Wed Sep 09 13:43:37 GMT-03:00 2009 Installed oraclewindows31.jar
Info Wed Sep 09 13:43:37 GMT-03:00 2009 Installed PrintServices11.jar
Info Wed Sep 09 13:43:37 GMT-03:00 2009 Installed sql200510.jar
Error Thu Sep 10 05:33:11 GMT-03:00 2009 Lost connection to server1
Info Thu Sep 10 05:33:11 GMT-03:00 2009 Connecting to server1
Warning Thu Sep 10 05:33:12 GMT-03:00 2009 Primary agent "server1" not found : Unable to connect to the agent.
Info Thu Sep 10 05:33:12 GMT-03:00 2009 Connecting to server2
Info Thu Sep 10 05:33:14 GMT-03:00 2009 Connection established to server2
Info Thu Sep 10 05:33:14 GMT-03:00 2009 Installed Exchange200320.jar
Info Thu Sep 10 05:33:14 GMT-03:00 2009 Installed oraclewindows31.jar
Info Thu Sep 10 05:33:14 GMT-03:00 2009 Installed PrintServices11.jar
Info Thu Sep 10 05:33:14 GMT-03:00 2009 Installed sql200510.jar
Warning Thu Sep 10 05:33:26 GMT-03:00 2009 ID00004744 Node server2 has stopped receiving heartbeats from Primary node server1 1/25. Declaring node as unresponsive.
Info Thu Sep 10 05:33:36 GMT-03:00 2009 ID00001155 Agent on server1 has started.
Info Thu Sep 10 05:33:40 GMT-03:00 2009 ID00001600 Verify Data Source State for Dados, VolumeType: AAM_Mirror on node server1, State = DETACHED
Info Thu Sep 10 05:33:43 GMT-03:00 2009 ID00004708 Node server1 has started receiving heartbeats from node server2.
Info Thu Sep 10 05:33:44 GMT-03:00 2009 ID00001350 Node server1 is running.
Info Thu Sep 10 05:33:44 GMT-03:00 2009 ID00004708 Node server2 has started receiving heartbeats from node server1.
Info Thu Sep 10 05:33:44 GMT-03:00 2009 ID00001567 Node server1 ftStateMon initialized.
Warning Thu Sep 10 06:30:56 GMT-03:00 2009 Node server1 heartbeat sender time warp of 410 seconds

Warning Thu Sep 10 06:30:57 GMT-03:00 2009 Node server1 heartbeat checker time warp of 411 seconds

Hugs,
Rogerio

2 Intern

 • 

2K Posts

September 17th, 2009 09:00

Need to have some error log here. Difficult to say otherwise. Is your network perfect?

2 Intern

 • 

2K Posts

September 17th, 2009 12:00

What I see here is that it first says Primary Agent server1 not found and then the heartbeat error from server1. It seems that the agent on server one goes down or disconnects momentarily and then heartbeat also going down could be a network drop somewhere from server1.

Please check both the network and any possible reason for server1 agent goind down.

262 Posts

September 17th, 2009 17:00

Hi,

There was a gap of the time of 410 seconds in the server.

Please adjust the system clock.

Afterwards, does heart beat become interrupted?

September 18th, 2009 07:00

Hi,
Where do I change the system clock in AutoStart?

Tks.

Hugs,
Rogerio

September 18th, 2009 11:00

I have seen that problem occurs on the same time VCB backup take the snapshot on this VM (node 1)

I believe that lost connection of node 1 is normal because the network connection on VMs stop to respond for few seconds when occurs the VCB snapshot.

The problem is ¿ when the node 2 ¿get up¿ the services, the data source show warnings because the data is not synchronize.

Any Idea to solve this problem?

Tks.

2 Intern

 • 

2K Posts

September 18th, 2009 19:00

Is your VCB dependent on Heartbeat network? On a mirrored data source configuration, whenever it will come up: it will resync OR I would say it should resync?

September 24th, 2009 11:00

Hi People,
I was analysis the VI and saw a very old snapshot, after excluded this snapshot I didn't have more problems with VCB and Autostart.

Tks.

Hugs,
Rogerio
No Events found!

Top