Unsolved
This post is more than 5 years old
12 Posts
0
1909
unresponsive ECS multinode cluster
I've posted before about this 4 node cluster where the login screen was unresponsive. Today, things are different in that the system isn't serving requests for S3 or Web UI logins. The cluster has been running for about three months now, and other than having to kill the login process twice on each node over this time frame, it's been fine. We can't log into the web UI or over Cyberduck / S3browser either.
quantity of unready seems to vary. Please advise!! Thanks.
:9101/stats/dt/DTInitStat/
256
0
0
32
32
[urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_17_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_30_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_27_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_22_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_6_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_1_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_12_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_14_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_8_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_29_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_3_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_19_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_24_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_5_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_11_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_21_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_31_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_0_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_16_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_26_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_2_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_15_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_7_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_23_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_9_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_13_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_25_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_28_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_4_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_18_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_10_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_20_32_0:]
Mike_Phelps
12 Posts
0
February 13th, 2017 14:00
each of the four VMs have two 500GB volumes attached. It shows as full from the OS side, but hoping that's just because it's all under control of ECS for use?
/dev/vdb1 524030980 513835172 10195808 99% /ecs/uuid-07dd84a3-1a30-430f-b8cc-271dfbd62450
/dev/vdc1 524030980 513835172 10195808 99% /ecs/uuid-bb45e59a-6c10-4e01-abc6-cb1f96b2bd16
JasonCwik
281 Posts
0
February 13th, 2017 14:00
Hi Mike,
How much space did you provision your system with? You may have run out of space.
coneryj
22 Posts
0
February 14th, 2017 06:00
I do believe that that is just due to ECS' reserved use.
I see the same on a functioning ECS:
ecs-obj-1-1:/ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-254:0-671088768-2591951dfe0f5f49fc6fdfc9c9106b50209a5f575a7e6254ab8921d9860e075c 99G 1.6G 92G 2% /
tmpfs 31G 0 31G 0% /dev
tmpfs 31G 0 31G 0% /sys/fs/cgroup
/dev/mapper/ECS-LVRoot 179G 133G 46G 75% /data
tmpfs 31G 0 31G 0% /run/secrets
tmpfs 31G 3.2G 28G 11% /var/run/docker.sock
shm 64M 0 64M 0% /dev/shm
/dev/sde1 5.5T 5.5T 7.1G 100% /dae/uuid-e2a4e90c-9621-4d92-b17e-0236ba0910d8
/dev/sdh1 5.5T 5.5T 7.1G 100% /dae/uuid-34c123a7-bd48-4163-bc99-078965a05d03
/dev/sdl1 5.5T 5.5T 7.1G 100% /dae/uuid-6e66afea-5a20-471e-a0c6-ba89080ca227
/dev/sdn1 5.5T 5.5T 7.1G 100% /dae/uuid-d2be46b9-651f-46c8-a335-cdf3d21914c9
/dev/sdd1 5.5T 5.5T 7.1G 100% /dae/uuid-08915080-a5f8-4a4c-87af-3092d142c29e
/dev/sdp1 5.5T 5.5T 7.1G 100% /dae/uuid-fba72158-a671-4b7b-b516-8ca053338aba
/dev/sdm1 5.5T 5.5T 7.1G 100% /dae/uuid-f2e84e77-c6ab-4a05-92d6-c962452d0407
/dev/sdq1 5.5T 5.5T 7.1G 100% /dae/uuid-7d23e002-0df2-4e2a-91bb-635dc0c03e94
/dev/sdr1 5.5T 5.5T 7.1G 100% /dae/uuid-14a67c53-e841-4bc2-bd4f-a069d66f3964
/dev/sdk1 5.5T 5.5T 7.1G 100% /dae/uuid-c6b3695c-2fba-4756-8b19-3a08a5aafbd0
/dev/sdo1 5.5T 5.5T 7.1G 100% /dae/uuid-0abbe733-ffae-4529-afc0-fe7798548260
/dev/sdg1 5.5T 5.5T 7.1G 100% /dae/uuid-a18f8fad-e97e-4abf-9c2d-088866709b3a
/dev/sdi1 5.5T 5.5T 7.1G 100% /dae/uuid-50434d25-38c9-4e48-92be-f85feaa95ec1
/dev/sdj1 5.5T 5.5T 7.1G 100% /dae/uuid-ebbdb700-208d-4dd4-ba32-283ca9c741ad
/dev/sdf1 5.5T 5.5T 7.1G 100% /dae/uuid-846a8f9e-74c9-4e62-992f-9bfc238957e6
coneryj
22 Posts
1
February 14th, 2017 06:00
are you able to use the mgmt api?
If so, login and retrieve a cookie:
curl -s -k -v -u root:ChangeMe https:// /login
Then use this token in the X-SDS-AUTH-TOKEN header of the rest request. If you're on a *nix system, the easiest way is to export a shell var and use that:
export TOKEN=X-SDS-AUTH-TOKEN:
curl -vvv -sk -H $TOKEN https://10.1.83.51:4443/object/capacity
also if you could send the ssm.log file. That would be helpful.
It's in /opt/storageos/logs/
Mike_Phelps
12 Posts
0
February 14th, 2017 07:00
* About to connect() to 10.10.1.11 port 443 (#0)
* Trying 10.10.1.11...
* Connected to 10.10.1.11 (10.10.1.11) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
* Server certificate:
* subject: CN=localhost
* start date: Nov 13 05:28:16 2016 GMT
* expire date: Nov 11 05:28:16 2026 GMT
* common name: localhost
* issuer: CN=localhost
* Server auth using Basic with user 'root'
> GET /login HTTP/1.1
> Authorization: Basic cm9vdDpTdXBlcm1hbjE=
> User-Agent: curl/7.29.0
> Host: 10.10.1.11
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Date: Tue, 14 Feb 2017 15:08:17 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 507
< Connection: keep-alive
< Set-Cookie: ECSUI_ERRORS=;Path=/
<
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Oops, an error occured
This exception has been logged with id 7289a4jh3.
* Connection #0 to host 10.10.1.11 left intact
Mike_Phelps
12 Posts
0
February 14th, 2017 08:00
DLP prevents me from attaching the log file:
/var/log/vipr/emcvipr-object/ssm.log
Mike_Phelps
12 Posts
0
February 16th, 2017 05:00
Jason Cwik
Thanks Jason for the help yesterday. The underlying infrastructure causing the ECS cluster communication issues have been resolved. Restarting the containers, we're still seeing:
512
256
0
I was going to build a single node with the latest software as recommended to bridge the gap until ECS appliances are installed and ready to consume. You mentioned an updated release to 3.x. Can you or others help located the latest Community Edition? Elastic Cloud Storage (ECS) Free Software Download | EMC still reflect 3.0.0 from September.
Thanks!! Mike
JasonCwik
281 Posts
0
February 16th, 2017 09:00
Hi Mike,
Please check out the GitHub site, it's always up-to-date: https://community.emc.com/message/968271?et=watches.email.thread#968271
Mike_Phelps
12 Posts
0
February 16th, 2017 13:00
my mistake. The Readme had mentioned V2 so I didn't trust it.
I did need to change --net=host to --network=host in two places in the step1 script having used Docker 1.13.
Thanks again to those who helped me through this.