ECS: xDoctor RAP014: Fabric Lifecycle Service er ikke sund | Lifecycle Jetty-serveren er ikke oppe at køre på port 9241
Summary: ECS: xDoctor RAP014: Fabric Lifecycle Service er ikke sund | Lifecycle Jetty-serveren er ikke oppe at køre på port 9241.
Symptoms
Udgave #1:
Følgende output i tjenestekonsollen ses efter kørsel af en opgradering på ECS fra version 3.0.X eller tidligere til version 3.1 eller nyere:
20180309 01:49:28.456: | | | PASS (21 min 29 sec) 20180309 01:49:28.462: | | PASS (21 min 29 sec) 20180309 01:49:28.463: | Run Keyword If 20180309 01:49:28.464: | | Node Service Upgrade Initializing... Executing Program: NODE_SERVICE_UPGRADE |-Disable CallHome | +-[0.0.0.0] SetCallHomeEnabled PASS (1/7, 1 sec) |-Push Service Image To Registries | |-Push Service Image to Head Registry | | |-[169.254.1.1] LoadImage PASS (2/7, 1 sec) | | +-[169.254.1.1] PushImage PASS (3/7) | +-Push Service Image to Remote Registries |-Upgrade Object On Specified Nodes | +-Initiate Object Upgrade if Required | +-[0.0.0.0] UpdateApplicationOnNodes PASS (4/7, 1 sec) |-Update Services Ownership To Lifecycle Manager on Specified Nodes | +-Update Ownership For Object | +-[169.254.1.1] UpdateOwnership PASS (5/7) |-Post-check Services Health | +-Validate Object Service on Specified Nodes | +-[169.254.1.1] ServiceHealth PASS (6/7, 21 sec) +-Enable CallHome +-[0.0.0.0] SetCallHomeEnabled PASS (7/7, 3 sec) Elapsed time is 30 sec. NODE_SERVICE_UPGRADE completed successfully Collecting data from cluster Information has been written to the Information has been written to the Executing /configure.sh --start action in object-main container which may take up to 600 seconds. 20180309 01:52:51.711: | | | PASS (3 min 23 sec) 20180309 01:52:51.720: | | PASS (3 min 23 sec) 20180309 01:52:51.722: | Run Keyword If 20180309 01:52:51.724: | | Update manifest file [ERROR] On node 169.254.1.1, Lifecycle Jetty server is not up and running on port 9241! 20180309 01:58:45.068: | | | FAIL (5 min 53 sec) 20180309 01:58:45.071: | | FAIL (5 min 53 sec) 20180309 01:58:45.072: | FAIL (45 min 43 sec) 20180309 01:58:45.075: Service Console Teardown 20180309 01:58:46.973: | PASS (1 sec) ================================================================================ Status: FAIL Time Elapsed: 45 min 56 sec Debug log: / HTML log: / ================================================================================ Messages: fabric-lifecycle service should be up and running ================================================================================
Udgave #2:
xDoctor kan rapportere følgende:
- xDoctor reports the following: Timestamp = 2015-09-25_092907 Category = health Source = fcli Severity = WARNING Message = Fabric Lifecycle Service not Healthy Extra =
Overvågning af Fabric Lifecycle Service ved hjælp af "sudo docker ps -a" viser, at tjenesten genstarter:
venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Up 3 seconds fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Exited (1) 2 seconds ago fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper
Cause
Årsag til problem #1:
ZooKeeper-beholderen kunne ikke starte korrekt på grund af snapshotstørrelsen.
Årsag til problem #2:
ECS IP er ved at blive løst til et forkert værtsnavn.
Resolution
Løsningsproblem #1:
Løst i ECS version 3.0
forbedrer ECS 3.0 komprimering og aktiveret opbevaring af ZK-meddelelser.
Hvis problemet opstår, skal du kontakte ECS-support.
Sådan bekræftes dette problem:
Kør-kommando:
# viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"'
Eksempel på output:
admin@:~> viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"' Output from host : 192.168.219.4 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.5 java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.3 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.7 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.2 Output from host : 192.168.219.8 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.6 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.1 adm@:in~>
Denne meddelelse vises i ZooKeeper-logfilerne:
OutOfMemoryError java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.jute.BinaryOutputArchive.writeLong(BinaryOutputArchive.java:59) at org.apache.zookeeper.data.Stat.serialize(Stat.java:129) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.proto.GetDataResponse.serialize(GetDataResponse.java:49) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1067) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
Løsningsproblem #2:
Bekræft DNS ved hjælp af nslookup og IP-adressen på ECS-noden, der har livscyklusgenstart.
# nslookup <ip of ecs node>
Hvis din DNS er korrekt, og der stadig er problemer med livscyklussen, skal du kontakte ECS Support.