ECS: xDoctor RAP014: 패브릭 수명주기 서비스가 정상 상태가 아님 | Lifecycle Jetty 서버가 포트 9241에서 실행되고 있지 않습니다.
Summary: ECS: xDoctor RAP014: 패브릭 수명주기 서비스가 정상 상태가 아님 | Lifecycle Jetty 서버가 포트 9241에서 실행되고 있지 않습니다.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
문제 #1:
ECS에서 버전 3.0.X 이하에서 버전 3.1 이상으로 업그레이드를 실행한 후 service-console에 다음 출력이 표시됩니다.
20180309 01:49:28.456: | | | PASS (21 min 29 sec) 20180309 01:49:28.462: | | PASS (21 min 29 sec) 20180309 01:49:28.463: | Run Keyword If 20180309 01:49:28.464: | | Node Service Upgrade Initializing... Executing Program: NODE_SERVICE_UPGRADE |-Disable CallHome | +-[0.0.0.0] SetCallHomeEnabled PASS (1/7, 1 sec) |-Push Service Image To Registries | |-Push Service Image to Head Registry | | |-[169.254.1.1] LoadImage PASS (2/7, 1 sec) | | +-[169.254.1.1] PushImage PASS (3/7) | +-Push Service Image to Remote Registries |-Upgrade Object On Specified Nodes | +-Initiate Object Upgrade if Required | +-[0.0.0.0] UpdateApplicationOnNodes PASS (4/7, 1 sec) |-Update Services Ownership To Lifecycle Manager on Specified Nodes | +-Update Ownership For Object | +-[169.254.1.1] UpdateOwnership PASS (5/7) |-Post-check Services Health | +-Validate Object Service on Specified Nodes | +-[169.254.1.1] ServiceHealth PASS (6/7, 21 sec) +-Enable CallHome +-[0.0.0.0] SetCallHomeEnabled PASS (7/7, 3 sec) Elapsed time is 30 sec. NODE_SERVICE_UPGRADE completed successfully Collecting data from cluster Information has been written to the Information has been written to the Executing /configure.sh --start action in object-main container which may take up to 600 seconds. 20180309 01:52:51.711: | | | PASS (3 min 23 sec) 20180309 01:52:51.720: | | PASS (3 min 23 sec) 20180309 01:52:51.722: | Run Keyword If 20180309 01:52:51.724: | | Update manifest file [ERROR] On node 169.254.1.1, Lifecycle Jetty server is not up and running on port 9241! 20180309 01:58:45.068: | | | FAIL (5 min 53 sec) 20180309 01:58:45.071: | | FAIL (5 min 53 sec) 20180309 01:58:45.072: | FAIL (45 min 43 sec) 20180309 01:58:45.075: Service Console Teardown 20180309 01:58:46.973: | PASS (1 sec) ================================================================================ Status: FAIL Time Elapsed: 45 min 56 sec Debug log: / HTML log: / ================================================================================ Messages: fabric-lifecycle service should be up and running ================================================================================
문제 #2:
xDoctor는 다음과 같이 보고할 수 있습니다.
- xDoctor reports the following: Timestamp = 2015-09-25_092907 Category = health Source = fcli Severity = WARNING Message = Fabric Lifecycle Service not Healthy Extra =
"sudo docker ps -a"를 사용하여 패브릭 수명주기 서비스를 모니터링하면 서비스가 재시작되고 있음을 알 수 있습니다.
venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Up 3 seconds fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper venus2:~ # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7995f18ba27f ip.ip.ip.ip:5000/emcvipr/object:2.0.1.0-62267.db4d4a8 "/opt/vipr/boot/boot 4 weeks ago Up 21 hours object-main 73f00ed0b6df ip.ip.ip.ip:5000/caspian/fabric:1.1.1.0-1998.1391e7e "./boot.sh lifecycle 4 weeks ago Exited (1) 2 seconds ago fabric-lifecycle ba19a3c95151 ip.ip.ip.ip:5000/caspian/fabric-zookeeper:1.1.0.0-54.54a204e "./boot.sh 2 1=169.2 4 weeks ago Up 21 hours fabric-zookeeper
Cause
원인 문제 #1:
스냅샷 크기 때문에 ZooKeeper 컨테이너를 제대로 시작할 수 없습니다.
원인 문제 #2:
ECS IP가 잘못된 호스트 이름으로 확인됩니다.
Resolution
솔루션 문제 #1:
ECS 버전 3.0
에서 해결됨: ECS 3.0은 압축을 개선하고 ZK 메시지에 대한 보존을 활성화했습니다.
참고: 이 해결 방법은 이 빌드가 호스트에 이미 설치된 경우에만 작동합니다. 즉, 성능이 저하된 시스템에서 이 빌드로 업그레이드를 수행할 때는 이 해결 방법이 도움이 되지 않습니다.
이 문제가 발생하면 ECS 지원에 문의하십시오.
이 문제를 확인하는 방법:
다음 명령을 실행합니다.
# viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"'
예시 출력:
admin@:~> viprexec 'cat /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log | grep "GC overhead limit exceeded"' Output from host : 192.168.219.4 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.5 java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.3 java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Output from host : 192.168.219.7 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.2 Output from host : 192.168.219.8 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.6 cat: /opt/emc/caspian/fabric/agent/services/fabric/zookeeper/log/zookeeper.log: No such file or directory Output from host : 192.168.219.1 adm@:in~>
이 메시지는 ZooKeeper 로그 파일에 표시됩니다.
OutOfMemoryError java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.jute.BinaryOutputArchive.writeLong(BinaryOutputArchive.java:59) at org.apache.zookeeper.data.Stat.serialize(Stat.java:129) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.proto.GetDataResponse.serialize(GetDataResponse.java:49) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1067) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
솔루션 문제 #2:
nslookup을 사용하여 DNS를 확인하고 수명주기가 재시작되는 ECS 노드의 IP 주소를 확인합니다.
# nslookup <ip of ecs node>
DNS가 올바르고 수명주기에 여전히 문제가 있는 경우 ECS 지원 팀에 문의하십시오.
Affected Products
ECS ApplianceProducts
ECS Appliance, ECS Appliance Hardware Gen1 U-Series, ECS Appliance Software with Encryption, ECS Appliance Software without EncryptionArticle Properties
Article Number: 000064892
Article Type: Solution
Last Modified: 21 Nov 2025
Version: 5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.