Mike_Phelps

12 Posts

1909

February 13th, 2017 10:00

unresponsive ECS multinode cluster

I've posted before about this 4 node cluster where the login screen was unresponsive. Today, things are different in that the system isn't serving requests for S3 or Web UI logins. The cluster has been running for about three months now, and other than having to kill the login process twice on each node over this time frame, it's been fine. We can't log into the web UI or over Cyberduck / S3browser either.

ecs1.JPG.jpg

quantity of unready seems to vary. Please advise!! Thanks.

:9101/stats/dt/DTInitStat/

512

256

0

RR

0

32

[urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_17_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_30_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_27_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_22_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_6_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_1_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_12_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_14_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_8_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_29_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_3_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_19_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_24_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_5_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_11_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_21_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_31_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_0_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_16_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_26_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_2_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_15_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_7_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_23_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_9_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_13_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_25_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_28_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_4_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_18_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_10_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_20_32_0:]

Responses(9)

Mike_Phelps

12 Posts

0

February 13th, 2017 14:00

each of the four VMs have two 500GB volumes attached. It shows as full from the OS side, but hoping that's just because it's all under control of ECS for use?

/dev/vdb1 524030980 513835172 10195808 99% /ecs/uuid-07dd84a3-1a30-430f-b8cc-271dfbd62450

/dev/vdc1 524030980 513835172 10195808 99% /ecs/uuid-bb45e59a-6c10-4e01-abc6-cb1f96b2bd16

JasonCwik

281 Posts

0

February 13th, 2017 14:00

Hi Mike,

How much space did you provision your system with? You may have run out of space.

coneryj

22 Posts

0

February 14th, 2017 06:00

I do believe that that is just due to ECS' reserved use.

I see the same on a functioning ECS:

ecs-obj-1-1:/ # df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/docker-254:0-671088768-2591951dfe0f5f49fc6fdfc9c9106b50209a5f575a7e6254ab8921d9860e075c 99G 1.6G 92G 2% /

tmpfs 31G 0 31G 0% /dev

tmpfs 31G 0 31G 0% /sys/fs/cgroup

/dev/mapper/ECS-LVRoot 179G 133G 46G 75% /data

tmpfs 31G 0 31G 0% /run/secrets

tmpfs 31G 3.2G 28G 11% /var/run/docker.sock

shm 64M 0 64M 0% /dev/shm

/dev/sde1 5.5T 5.5T 7.1G 100% /dae/uuid-e2a4e90c-9621-4d92-b17e-0236ba0910d8

/dev/sdh1 5.5T 5.5T 7.1G 100% /dae/uuid-34c123a7-bd48-4163-bc99-078965a05d03

/dev/sdl1 5.5T 5.5T 7.1G 100% /dae/uuid-6e66afea-5a20-471e-a0c6-ba89080ca227

/dev/sdn1 5.5T 5.5T 7.1G 100% /dae/uuid-d2be46b9-651f-46c8-a335-cdf3d21914c9

/dev/sdd1 5.5T 5.5T 7.1G 100% /dae/uuid-08915080-a5f8-4a4c-87af-3092d142c29e

/dev/sdp1 5.5T 5.5T 7.1G 100% /dae/uuid-fba72158-a671-4b7b-b516-8ca053338aba

/dev/sdm1 5.5T 5.5T 7.1G 100% /dae/uuid-f2e84e77-c6ab-4a05-92d6-c962452d0407

/dev/sdq1 5.5T 5.5T 7.1G 100% /dae/uuid-7d23e002-0df2-4e2a-91bb-635dc0c03e94

/dev/sdr1 5.5T 5.5T 7.1G 100% /dae/uuid-14a67c53-e841-4bc2-bd4f-a069d66f3964

/dev/sdk1 5.5T 5.5T 7.1G 100% /dae/uuid-c6b3695c-2fba-4756-8b19-3a08a5aafbd0

/dev/sdo1 5.5T 5.5T 7.1G 100% /dae/uuid-0abbe733-ffae-4529-afc0-fe7798548260

/dev/sdg1 5.5T 5.5T 7.1G 100% /dae/uuid-a18f8fad-e97e-4abf-9c2d-088866709b3a

/dev/sdi1 5.5T 5.5T 7.1G 100% /dae/uuid-50434d25-38c9-4e48-92be-f85feaa95ec1

/dev/sdj1 5.5T 5.5T 7.1G 100% /dae/uuid-ebbdb700-208d-4dd4-ba32-283ca9c741ad

/dev/sdf1 5.5T 5.5T 7.1G 100% /dae/uuid-846a8f9e-74c9-4e62-992f-9bfc238957e6

coneryj

22 Posts

1

February 14th, 2017 06:00

are you able to use the mgmt api?

If so, login and retrieve a cookie:

curl -s -k -v -u root:ChangeMe https:// /login

Then use this token in the X-SDS-AUTH-TOKEN header of the rest request. If you're on a *nix system, the easiest way is to export a shell var and use that:

export TOKEN=X-SDS-AUTH-TOKEN:

curl -vvv -sk -H $TOKEN https://10.1.83.51:4443/object/capacity

also if you could send the ssm.log file. That would be helpful.

It's in /opt/storageos/logs/

Mike_Phelps

12 Posts

0

February 14th, 2017 07:00

* About to connect() to 10.10.1.11 port 443 (#0)

* Trying 10.10.1.11...

* Connected to 10.10.1.11 (10.10.1.11) port 443 (#0)

* Initializing NSS with certpath: sql:/etc/pki/nssdb

* skipping SSL peer certificate verification

* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

* Server certificate:

* subject: CN=localhost

* start date: Nov 13 05:28:16 2016 GMT

* expire date: Nov 11 05:28:16 2026 GMT

* common name: localhost

* issuer: CN=localhost

* Server auth using Basic with user 'root'

> GET /login HTTP/1.1

> Authorization: Basic cm9vdDpTdXBlcm1hbjE=

> User-Agent: curl/7.29.0

> Host: 10.10.1.11

> Accept: */*

>

< HTTP/1.1 500 Internal Server Error

< Date: Tue, 14 Feb 2017 15:08:17 GMT

< Content-Type: text/html; charset=utf-8

< Content-Length: 507

< Connection: keep-alive

< Set-Cookie: ECSUI_ERRORS=;Path=/

<

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

Application error

Oops, an error occured

This exception has been logged with id 7289a4jh3.

* Connection #0 to host 10.10.1.11 left intact

Mike_Phelps

12 Posts

0

February 14th, 2017 08:00

DLP prevents me from attaching the log file:

/var/log/vipr/emcvipr-object/ssm.log

Mike_Phelps

12 Posts

0

February 16th, 2017 05:00

Jason Cwik

Thanks Jason for the help yesterday. The underlying infrastructure causing the ECS cluster communication issues have been resolved. Restarting the containers, we're still seeing:

512

256

0

I was going to build a single node with the latest software as recommended to bridge the gap until ECS appliances are installed and ready to consume. You mentioned an updated release to 3.x. Can you or others help located the latest Community Edition? Elastic Cloud Storage (ECS) Free Software Download | EMC still reflect 3.0.0 from September.

Thanks!! Mike

JasonCwik

281 Posts

0

February 16th, 2017 09:00

Hi Mike,

Please check out the GitHub site, it's always up-to-date: https://community.emc.com/message/968271?et=watches.email.thread#968271

Mike_Phelps

12 Posts

0

February 16th, 2017 13:00

my mistake. The Readme had mentioned V2 so I didn't trust it.

I did need to change --net=host to --network=host in two places in the step1 script having used Docker 1.13.

Thanks again to those who helped me through this.

View All

No Events found!