Unsolved
This post is more than 5 years old
12 Posts
0
2586
June 27th, 2017 10:00
Random Oscillation alerts
I have a test setup running 2.0.1.2. This is a 3 node Ubuntu linux16.04 setup with 2 networks, one management and the other data. The sds configuration is set for the data network (Communicate Both SDS and SDC). On one node I keep getting snmp traps MDM.MDM_Cluster.CLUSTER_DEGRADED and have found in the SDS trc.0 log events such as below that match the exact time of the trap.
Is there a guide or documentation on what Oscillation types are and what they mean or has anyone else had this one and know what is causing it? It is so brief, I haven't even see the admin interface display a problem and no rebuild occurs from what I see. It is truly is just a blip as can be seen in the 1 sec reference. In the Scaleio client under the SDS their are "none found" for Oscillating Failure Counters.
26/06 21:56:24.203890 0x7f0701bdaeb8:contNet_OscillationNotif:01720: Con ca6db64300000002 - Oscillation of type 5 (RPC_LINGERED_1SEC) reported


pawelw1
306 Posts
0
July 4th, 2017 02:00
Hi,
In most cases these errors indicate a "hiccup" in the network, causing temporary disconnections. Sometimes they can indicate disk problems (i.e. MDM running on a slow/faulting HDD) or CPU starvation (we mostly see it in virtualized environments though). I would check and tune both networks used by MDM, see if there are any dropped packets on the interfaces etc.
Osciliating errors are shortly described in ScaleIO Deployment Guide, I don't think there's any documentation that covers them in-depth, they are mostly for internal debugging.
Hope that helps!
Cheers,
Pawel
carterbury
12 Posts
0
July 10th, 2017 11:00
I finally found in the MDM logs on a recent event this data. Below is a vew from MDM1 and MDM2 and TB1. Not really seeing much in the TB logs almost like it still sees MDM2 but MDM1 doesn't is what I surmise. I just want to focus on the problem machine/machines. The lost lease is also interesting and im not sure what the process is to determine lease.
MDM1:
10/07 10:38:50.268650 0x7fef80a10eb8:actorVoter_SetLeaseState:09171: Net object ID: 7b21663a21af5291, Voter ID: 35824603117c7161 lost lease (ticks expired). ticks beyond expiration: 0 ticks expiration 2139617646
10/07 10:38:50.268659 0x7fef80a10eb8:syncer_NotifySecondaryLost:02216: Notification that secondary is lost
10/07 10:38:50.268670 0x7fef80a10eb8:syncerDegrador_DoUnlocked:01614: Degrador asked to work. Degraded asked is 1. Block 0
10/07 10:38:50.268752 0x7fef80a19eb8:syncer_MoveToDegraded:00568: Syncer UMT start moving to degraded mode
10/07 10:38:50.268793 0x7fef80a19eb8:syncerDegrador_DoUnlocked:01614: Degrador asked to work. Degraded asked is 1. Block 1
10/07 10:38:50.268815 0x7fef809e3eb8:syncerDegrador_Umt:01408: Degrador asked to move to state MOVE_TO_DEGRADED
10/07 10:38:50.268819 0x7fef809e3eb8:syncerDegrador_Umt:01415: MOVE_TO_DEGRADED for 25b3bc8d1145f4f1
10/07 10:38:50.268852 0x7fef80a07eb8:voter_UpdateStateByMsg:01308: voter_KnownMasterRenew, RC: SUCCESS
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 6894583c50312f90 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.268860 0x7fef80a07eb8:voter_UpdateStateByMsg:01308:
Msg: actorId 469f2c096d70af70 actorGen 2 voterId 6894583c50312f90 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 0 successor 0 bHasQuorum 1 startTick 2139617646
10/07 10:38:50.268944 0x7fef80a10eb8:actor_Loop:11958: Still handling the same trigger since we couldn't convince all voters yet, votersFullyUpdated=0, voterHalf=1
10/07 10:38:50.270315 0x7fef80a10eb8:actorVoter_ProcessOneRsp:09502: We have the lease. Update degradedLocal: 83 Msg: 84
10/07 10:38:50.270319 0x7fef80a10eb8:actorVoter_ProcessOneRsp:09776: voterID: 6894583c50312f90 out of order response. bHasLease: 1 newLeaseTime: 2139617696 ticksExpiration: 2139617696
10/07 10:38:50.270322 0x7fef80a10eb8:actor_Loop:11958: Still handling the same trigger since we couldn't convince all voters yet, votersFullyUpdated=1, voterHalf=1
10/07 10:38:50.272084 0x7fef80a10eb8:actorVoter_ProcessOneRsp:09502: We have the lease. Update degradedLocal: 83 Msg: 84
10/07 10:38:50.272086 0x7fef80a10eb8:actorVoter_ProcessOneRsp:09776: voterID: 473f63f47cf794d2 out of order response. bHasLease: 1 newLeaseTime: 2139617696 ticksExpiration: 2139617696
10/07 10:38:50.272097 0x7fef80a10eb8:mosEventLog_PostInternal:00590: New event added. Message: "MDM cluster node is now DEGRADED - node ID 7b21663a21af5291; IPs: [SecondaryMDMIP], Port: 9011 is offline.". Additional info: "" Severity: Error
10/07 10:38:50.272101 0x7fef809e3eb8:syncerDegrador_Umt:01445: Degrador finished to move to state MOVE_TO_DEGRADED. bNextDegraded 1
10/07 10:38:50.272115 0x7fef80a19eb8:syncer_MoveToDegraded:00593: Syncer UMT finished moving to degraded mode
10/07 10:38:50.272116 0x7fef80a19eb8:syncer_Degraded:00617: Syncer UMT start handling degraded mode
10/07 10:38:50.272121 0x7fef80a19eb8:actor_FillJoinClusterReq:05122: Cluster member 1 netObjID: 6375a1ef3cd1aed2 netObjType 2 ActorID: 0 VoterID: 473f63f47cf794d2
10/07 10:38:50.272124 0x7fef80a19eb8:actor_FillJoinClusterReq:05122: Cluster member 2 netObjID: 7b21663a21af5291 netObjType 1 ActorID: 25b3bc8d1145f4f1 VoterID: 35824603117c7161
10/07 10:38:50.272125 0x7fef80a19eb8:syncer_SendStartSync:00480: syncSize: 624848. Local: PID 3845. Gen 626Msg: PID 3845. Gen 626
MDM2:
10/07 10:38:50.218350 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [0,0] voteStates: [INVALID=0,UNKNOWN=3,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=0,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.218393 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [1,0] voteStates: [INVALID=0,UNKNOWN=2,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=1,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.218435 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [1,0] voteStates: [INVALID=0,UNKNOWN=2,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=1,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.218455 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [0,0] voteStates: [INVALID=0,UNKNOWN=3,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=0,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.319026 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [0,0] voteStates: [INVALID=0,UNKNOWN=3,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=0,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.319075 0x7f8ad0a07eb8:voter_UpdateStateByMsg:01308: voter_KnownMasterChallenged, RC: SUCCESS
Local: state KNOWN_MASTER actorId 25b3bc8d1145f4f1 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state NO_MASTER actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.319079 0x7f8ad0a07eb8:voter_UpdateStateByMsg:01308:
Msg: actorId 25b3bc8d1145f4f1 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 0 successor 0 bHasQuorum 0 startTick 2105625910
10/07 10:38:50.319199 0x7f8ad0a10eb8:actor_Loop:11775: Set new degraded gen 84,old one was. 83.New degraded [25b3bc8d1145f4f1] old one []
10/07 10:38:50.319203 0x7f8ad0a10eb8:actorLoop_NeedCede:11174: Not ceding as there are still enough free voters. voterHalf 1 ownedByOthers: [1,0] voteStates: [INVALID=0,UNKNOWN=2,NOT_OWNED=0,OWNED_BY_ME=0,OWNED_BY_OTHER=1,BLOCKED=0,NO_ANSWER=0,ERROR=0]
10/07 10:38:50.323629 0x7f8ad0a07eb8:voter_ReleaseMaster:01534: Releasing master - no successor, RC: SUCCESS
Local: state KNOWN_MASTER actorId 25b3bc8d1145f4f1 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state NO_MASTER actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.323635 0x7f8ad0a07eb8:voter_ReleaseMaster:01534:
Msg: actorId 25b3bc8d1145f4f1 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625910
10/07 10:38:50.323642 0x7f8ad0a10eb8:actorVoter_SetLeaseState:09176: Net object ID: 7b21663a21af5291, Voter ID: 35824603117c7161 gained lease
10/07 10:38:50.323746 0x7f8ad0a10eb8:actorVoter_SetLeaseState:09171: Net object ID: 7b21663a21af5291, Voter ID: 35824603117c7161 lost lease (rc: NO_MASTER) ticks expiration 2105625960
10/07 10:38:50.534973 0x7f8ad09bfeb8:replFile_WriteLocal:00465: WARNING: Harden took too long: 1700 ms
10/07 10:38:50.535005 0x7f8ad09eceb8:voter_UpdateStateByMsg:01308: voter_NoMaster, RC: SUCCESS
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state NO_MASTER actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.535009 0x7f8ad09eceb8:voter_UpdateStateByMsg:01308:
Msg: actorId 469f2c096d70af70 actorGen 2 voterId 35824603117c7161 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 0 successor 0 bHasQuorum 1 startTick 2139617606
10/07 10:38:50.535048 0x7f8ad09bfeb8:replFile_WriteLocal:00473: Optimization: No need to write to offset 495616 size 32768
10/07 10:38:50.535061 0x7f8ad09bfeb8:syncerSlaveRcvGrp_RecvRequestCB:01649: START_SYNC START
10/07 10:38:50.535062 0x7f8ad09bfeb8:syncerSlave_HandleStartSync:01159: SyncerSlave received start-sync
10/07 10:38:50.535063 0x7f8ad09bfeb8:actor_CtrlLock:14971: (syncerSlave_HandleStartSync) Locking
10/07 10:38:50.535065 0x7f8ad09bfeb8:actor_JoinCluster:06488: Entering: clsUniqueID: 36bf3246632439ba actorGen: 2 clusterMode: 3_Node Sender: actorID: 469f2c096d70af70 netObjID: 2be67615007ce650 Remote: actorID: 25b3bc8d1145f4f1 netObjID: 7b21663a21af5291 virtualIPs: [] bEnableClientSecureCommunication: 1 bErashConfig: 0
10/07 10:38:50.535072 0x7f8ad09bfeb8:actor_JoinCluster:06493: Entering2: ClusterMembers: [{netObjID: 2be67615007ce650, actorID: 469f2c096d70af70, voterID: 6894583c50312f90},{netObjID: 6375a1ef3cd1aed2, actorID: 0000000000000000, voterID: 473f63f47cf794d2},{netObjID: 7b21663a21af5291, actorID: 25b3bc8d1145f4f1, voterID: 35824603117c7161},{netObjID: 0000000000000000, actorID: 0000000000000000, voterID: 0000000000000000},{netObjID: 0000000000000000, actorID: 0000000000000000, voterID: 0000000000000000}]
10/07 10:38:50.535076 0x7f8ad09bfeb8:actor_JoinCluster:06516: Msg NetObj 0: ID: 2be67615007ce650 Type: 1 Control IPs: [MDM1IP], Port: 9011
10/07 10:38:50.535078 0x7f8ad09bfeb8:actor_JoinCluster:06516: Msg NetObj 1: ID: 7b21663a21af5291 Type: 1 Control IPs: [MDM2IP], Port: 9011
10/07 10:38:50.535079 0x7f8ad09bfeb8:actor_JoinCluster:06516: Msg NetObj 2: ID: 6375a1ef3cd1aed2 Type: 2 Control IPs: [TieBreakerIP], Port: 9011
10/07 10:38:50.535081 0x7f8ad09bfeb8:actor_JoinCluster:06597: Join cluster matches previous config. Refreshing network objects only. Local clsUniqueID: 36bf3246632439ba actorGen 2
10/07 10:38:50.535085 0x7f8ad09bfeb8:actor_JoinCluster:06750: No cluster changes
10/07 10:38:50.535153 0x7f8ad09eceb8:voter_UpdateStateByMsg:01308: voter_KnownMasterRenew, RC: SUCCESS
TB1:
10/07 10:38:50.213504 0x7fcef000ceb8:voter_HandleMeMaster:03202: MeMaster didn't succeed, RC: IGNORE
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.213511 0x7fcef000ceb8:voter_HandleMeMaster:03202:
Msg: actorId 25b3bc8d1145f4f1 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625900
10/07 10:38:50.213515 0x7fcef000ceb8:voter_HandleMeMaster:03204:
Reply: actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 83 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625900
10/07 10:38:50.263885 0x7fcef000ceb8:voter_UpdateStateByMsg:01308: voter_KnownMasterRenew, RC: SUCCESS
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.263890 0x7fcef000ceb8:voter_UpdateStateByMsg:01308:
Msg: actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 0 successor 0 bHasQuorum 1 startTick 2139617646
10/07 10:38:50.314436 0x7fcef000ceb8:voter_HandleMeMaster:03202: MeMaster didn't succeed, RC: IGNORE
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.314440 0x7fcef000ceb8:voter_HandleMeMaster:03202:
Msg: actorId 25b3bc8d1145f4f1 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625910
10/07 10:38:50.314442 0x7fcef000ceb8:voter_HandleMeMaster:03204:
Reply: actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625910
10/07 10:38:50.812776 0x7fcef000ceb8:voter_HandleMeMaster:03202: MeMaster didn't succeed, RC: IGNORE
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.812789 0x7fcef000ceb8:voter_HandleMeMaster:03202:
Msg: actorId 25b3bc8d1145f4f1 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625960
10/07 10:38:50.812795 0x7fcef000ceb8:voter_HandleMeMaster:03204:
Reply: actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [25b3bc8d1145f4f1] DegradedGen 84 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 1 successor 0 bHasQuorum 0 startTick 2105625960
10/07 10:38:50.904693 0x7fcef000ceb8:voter_UpdateStateByMsg:01308: voter_KnownMasterRenew, RC: SUCCESS
Local: state KNOWN_MASTER actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 85 IsFrozen 0 clsID 36bf3246632439ba bHasLease 1
Candidate: state INVALID actorId 0 actorGen 0 DegradedGen 0 IsFrozen 0 clsID 0 bHasLease 0
10/07 10:38:50.904706 0x7fcef000ceb8:voter_UpdateStateByMsg:01308:
Msg: actorId 469f2c096d70af70 actorGen 2 voterId 473f63f47cf794d2 oosIDs [] DegradedGen 85 IsFrozen 0 clsID 36bf3246632439ba bMeNoMaster 0 successor 0 bHasQuorum 1 startTick 2139617710