ECS:xDoctor RAP014:结构生命周期服务不正常 |生命周期 Jetty 服务器未在端口 9241 上启动并运行
Summary: ECS:xDoctor RAP014:结构生命周期服务不正常 |生命周期 Jetty 服务器未在端口 9241 上启动并运行。
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
问题 #1:
在 ECS 上运行从版本 3.0.X 或更低版本到版本 3.1 或更高版本的升级后,服务控制台中会显示以下输出:
20180309 01:49:28.456: | | | PASS (21 min 29 sec) 20180309 01:49:28.462: | | PASS (21 min 29 sec) 20180309 01:49:28.463: | Run Keyword If 20180309 01:49:28.464: | | Node Service Upgrade Initializing... Executing Program: NODE_SERVICE_UPGRADE |-Disable CallHome | +-[0.0.0.0] SetCallHomeEnabled PASS (1/7, 1 sec) |-Push Service Image To Registries | |-Push Service Image to Head Registry | | |-[169.254.1.1] LoadImage PASS (2/7, 1 sec) | | +-[169.254.1.1] PushImage PASS (3/7) | +-Push Service Image to Remote Registries |-Upgrade Object On Specified Nodes | +-Initiate Object Upgrade if Required | +-[0.0.0.0] UpdateApplicationOnNodes PASS (4/7, 1 sec) |-Update Services Ownership To Lifecycle Manager on Specified Nodes | +-Update Ownership For Object | +-[169.254.1.1] UpdateOwnership PASS (5/7) |-Post-check Services Health | +-Validate Object Service on Specified Nodes | +-[169.254.1.1] ServiceHealth PASS (6/7, 21 sec) +-Enable CallHome +-[0.0.0.0] SetCallHomeEnabled PASS (7/7, 3 sec) Elapsed time is 30 sec. NODE_SERVICE_UPGRADE completed successfully Collecting data from cluster Information has been written to the Information has been written to the Executing /configure.sh --start action in object-main container which may take up to 600 seconds. 20180309 01:52:51.711: | | | PASS (3 min 23 sec) 20180309 01:52:51.720: | | PASS (3 min 23 sec) 20180309 01:52:51.722: | Run Keyword If 20180309 01:52:51.724: | | Update manifest file [ERROR] On node 169.254.1.1, Lifecycle Jetty server is not up and running on port 9241! 20180309 01:58:45.068: | | | FAIL (5 min 53 sec) 20180309 01:58:45.071: | | FAIL (5 min 53 sec) 20180309 01:58:45.072: | FAIL (45 min 43 sec) 20180309 01:58:45.075: Service Console Teardown 20180309 01:58:46.973: | PASS (1 sec) ================================================================================ Status: FAIL Time Elapsed: 45 min 56 sec Debug log: / HTML log: / ================================================================================ Messages: fabric-lifecycle service should be up and running ================================================================================
问题 #2:
xDoctor 可能会报告以下内容:
- xDoctor reports the following: Timestamp = 2015-09-25_092907 Category = health Source = fcli Severity = WARNING Message = Fabric Lifecycle Service not Healthy Extra =
使用“sudo docker ps -a”监视结构生命周期服务会显示该服务正在重新启动:
venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Up 3 seconds fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Exited (1) 2 seconds ago fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper
Cause
原因问题 #1:
由于快照大小,ZooKeeper 容器无法正常启动。
原因问题 #2:
ECS IP 解析为不正确的主机名。
Resolution
解决方案问题 #1:
ECS 3.0
版本中解决此问题:ECS 3.0 改进了压缩并启用了 ZK 消息的保留。
提醒:仅当此内部版本已安装到主机时,此解决方案才有效。这意味着,当在降级的系统上执行升级到此内部版本时,此解决方案不起作用。
如果出现此问题,请联系 ECS 支持。
如何验证此问题:
运行命令:
# viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"'
示例输出:
admin@:~> viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"' Output from host : 192.168.219.4 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.5 java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.3 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.7 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.2 Output from host : 192.168.219.8 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.6 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.1 adm@:in~>
此消息显示在 ZooKeeper 日志文件中:
OutOfMemoryError java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.jute.BinaryOutputArchive.writeLong(BinaryOutputArchive.java:59) at org.apache.zookeeper.data.Stat.serialize(Stat.java:129) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.proto.GetDataResponse.serialize(GetDataResponse.java:49) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1067) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
解决方案问题#2:
使用 nslookup 验证 DNS 和生命周期重新启动的 ECS 节点的 IP 地址。
# nslookup <ip of ecs node>
如果您的 DNS 正确,但生命周期仍存在问题,请联系 ECS 支持。
Affected Products
ECS ApplianceProducts
ECS Appliance, ECS Appliance Hardware Gen1 U-Series, ECS Appliance Software with Encryption, ECS Appliance Software without EncryptionArticle Properties
Article Number: 000064892
Article Type: Solution
Last Modified: 21 Nov 2025
Version: 5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.