ECS : xDoctor RAP014 : Service de cycle de vie de fabric non fonctionnel | Lifecycle Le serveur Jetty n’est pas opérationnel sur le port 9241
Summary: ECS : xDoctor RAP014 : Service de cycle de vie de fabric non fonctionnel | Lifecycle Jetty Server n’est pas opérationnel sur le port 9241.
Symptoms
Problème #1 :
La sortie suivante dans la console de service s’affiche après l’exécution d’une mise à niveau sur ECS de la version 3.0.X ou antérieure vers la version 3.1 ou ultérieure :
20180309 01:49:28.456: | | | PASS (21 min 29 sec) 20180309 01:49:28.462: | | PASS (21 min 29 sec) 20180309 01:49:28.463: | Run Keyword If 20180309 01:49:28.464: | | Node Service Upgrade Initializing... Executing Program: NODE_SERVICE_UPGRADE |-Disable CallHome | +-[0.0.0.0] SetCallHomeEnabled PASS (1/7, 1 sec) |-Push Service Image To Registries | |-Push Service Image to Head Registry | | |-[169.254.1.1] LoadImage PASS (2/7, 1 sec) | | +-[169.254.1.1] PushImage PASS (3/7) | +-Push Service Image to Remote Registries |-Upgrade Object On Specified Nodes | +-Initiate Object Upgrade if Required | +-[0.0.0.0] UpdateApplicationOnNodes PASS (4/7, 1 sec) |-Update Services Ownership To Lifecycle Manager on Specified Nodes | +-Update Ownership For Object | +-[169.254.1.1] UpdateOwnership PASS (5/7) |-Post-check Services Health | +-Validate Object Service on Specified Nodes | +-[169.254.1.1] ServiceHealth PASS (6/7, 21 sec) +-Enable CallHome +-[0.0.0.0] SetCallHomeEnabled PASS (7/7, 3 sec) Elapsed time is 30 sec. NODE_SERVICE_UPGRADE completed successfully Collecting data from cluster Information has been written to the Information has been written to the Executing /configure.sh --start action in object-main container which may take up to 600 seconds. 20180309 01:52:51.711: | | | PASS (3 min 23 sec) 20180309 01:52:51.720: | | PASS (3 min 23 sec) 20180309 01:52:51.722: | Run Keyword If 20180309 01:52:51.724: | | Update manifest file [ERROR] On node 169.254.1.1, Lifecycle Jetty server is not up and running on port 9241! 20180309 01:58:45.068: | | | FAIL (5 min 53 sec) 20180309 01:58:45.071: | | FAIL (5 min 53 sec) 20180309 01:58:45.072: | FAIL (45 min 43 sec) 20180309 01:58:45.075: Service Console Teardown 20180309 01:58:46.973: | PASS (1 sec) ================================================================================ Status: FAIL Time Elapsed: 45 min 56 sec Debug log: / HTML log: / ================================================================================ Messages: fabric-lifecycle service should be up and running ================================================================================
Problème #2 :
xDoctor peut signaler les éléments suivants :
- xDoctor reports the following: Timestamp = 2015-09-25_092907 Category = health Source = fcli Severity = WARNING Message = Fabric Lifecycle Service not Healthy Extra =
La surveillance du service Lifecycle de structure à l’aide de « sudo docker ps -a » indique que le service est en cours de redémarrage :
venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Up 3 seconds fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Exited (1) 2 seconds ago fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper
Cause
Cause Problème #1 :
Le conteneur ZooKeeper n’a pas pu démarrer correctement en raison de la taille du snapshot.
Cause Problème #2 :
L’adresse IP d’ECS se résout par un nom d’hôte incorrect.
Resolution
Problème de solution #1 :
Résolu dans la version 3.0
d’ECS, ECS 3.0 améliore le compactage et la rétention des messages ZK.
Si ce problème se produit, contactez le support ECS.
Comment vérifier ce problème :
Exécutez la commande :
# viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"'
Exemple de résultat :
admin@:~> viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"' Output from host : 192.168.219.4 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.5 java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.3 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.7 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.2 Output from host : 192.168.219.8 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.6 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.1 adm@:in~>
Ce message s’affiche dans les fichiers journaux de ZooKeeper :
OutOfMemoryError java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.jute.BinaryOutputArchive.writeLong(BinaryOutputArchive.java:59) at org.apache.zookeeper.data.Stat.serialize(Stat.java:129) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.proto.GetDataResponse.serialize(GetDataResponse.java:49) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1067) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
Problème de solution #2 :
Vérifiez le DNS à l’aide de nslookup et l’adresse IP du nœud ECS dont le cycle de vie est redémarré.
# nslookup <ip of ecs node>
Si votre DNS est correct et que le cycle de vie rencontre toujours des problèmes, contactez le support ECS.