Start a Conversation

Unsolved

This post is more than 5 years old

1890

February 13th, 2017 10:00

unresponsive ECS multinode cluster

I've posted before about this 4 node cluster where the login screen was unresponsive.  Today, things are different in that the system isn't serving requests for S3 or Web UI logins.  The cluster has been running for about three months now, and other than having to kill the login process twice on each node over this time frame, it's been fine.  We can't log into the web UI or over Cyberduck / S3browser either.

ecs1.JPG.jpg

quantity of unready seems to vary.  Please advise!! Thanks.

:9101/stats/dt/DTInitStat/ 

512

256

0

RR

0

32

32

[urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_17_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_30_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_27_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_22_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_6_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_1_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_12_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_14_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_8_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_29_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_3_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_19_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_24_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_5_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_11_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_21_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_31_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_0_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_16_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_26_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_2_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_15_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_7_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_23_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_9_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_13_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_25_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_28_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_4_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_18_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_10_32_0:, urn:storageos:OwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_20_32_0:]



12 Posts

February 13th, 2017 14:00

each of the four VMs have two 500GB volumes attached.   It shows as full from the OS side, but hoping that's just because it's all under control of ECS for use?

/dev/vdb1      524030980 513835172  10195808  99% /ecs/uuid-07dd84a3-1a30-430f-b8cc-271dfbd62450

/dev/vdc1      524030980 513835172  10195808  99% /ecs/uuid-bb45e59a-6c10-4e01-abc6-cb1f96b2bd16

281 Posts

February 13th, 2017 14:00

Hi Mike,

How much space did you provision your system with?  You may have run out of space.

22 Posts

February 14th, 2017 06:00

I do believe that that is just due to ECS' reserved use.

I see the same on a functioning ECS:

ecs-obj-1-1:/ # df -h

Filesystem                                                                                           Size  Used Avail Use% Mounted on

/dev/mapper/docker-254:0-671088768-2591951dfe0f5f49fc6fdfc9c9106b50209a5f575a7e6254ab8921d9860e075c   99G  1.6G   92G   2% /

tmpfs                                                                                                 31G     0   31G   0% /dev

tmpfs                                                                                                 31G     0   31G   0% /sys/fs/cgroup

/dev/mapper/ECS-LVRoot                                                                               179G  133G   46G  75% /data

tmpfs                                                                                                 31G     0   31G   0% /run/secrets

tmpfs                                                                                                 31G  3.2G   28G  11% /var/run/docker.sock

shm                                                                                                   64M     0   64M   0% /dev/shm

/dev/sde1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-e2a4e90c-9621-4d92-b17e-0236ba0910d8

/dev/sdh1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-34c123a7-bd48-4163-bc99-078965a05d03

/dev/sdl1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-6e66afea-5a20-471e-a0c6-ba89080ca227

/dev/sdn1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-d2be46b9-651f-46c8-a335-cdf3d21914c9

/dev/sdd1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-08915080-a5f8-4a4c-87af-3092d142c29e

/dev/sdp1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-fba72158-a671-4b7b-b516-8ca053338aba

/dev/sdm1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-f2e84e77-c6ab-4a05-92d6-c962452d0407

/dev/sdq1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-7d23e002-0df2-4e2a-91bb-635dc0c03e94

/dev/sdr1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-14a67c53-e841-4bc2-bd4f-a069d66f3964

/dev/sdk1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-c6b3695c-2fba-4756-8b19-3a08a5aafbd0

/dev/sdo1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-0abbe733-ffae-4529-afc0-fe7798548260

/dev/sdg1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-a18f8fad-e97e-4abf-9c2d-088866709b3a

/dev/sdi1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-50434d25-38c9-4e48-92be-f85feaa95ec1

/dev/sdj1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-ebbdb700-208d-4dd4-ba32-283ca9c741ad

/dev/sdf1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-846a8f9e-74c9-4e62-992f-9bfc238957e6

22 Posts

February 14th, 2017 06:00

are you able to use the mgmt api?

If so, login and retrieve a cookie:

curl -s -k -v -u root:ChangeMe https:// /login


Then use this token in the X-SDS-AUTH-TOKEN header of the rest request. If you're on a *nix system, the easiest way is to export a shell var and use that:

export TOKEN=X-SDS-AUTH-TOKEN:


curl -vvv -sk -H $TOKEN https://10.1.83.51:4443/object/capacity




also if you could send the ssm.log file. That would be helpful.

It's in /opt/storageos/logs/

12 Posts

February 14th, 2017 07:00

* About to connect() to 10.10.1.11 port 443 (#0)

*   Trying 10.10.1.11...

* Connected to 10.10.1.11 (10.10.1.11) port 443 (#0)

* Initializing NSS with certpath: sql:/etc/pki/nssdb

* skipping SSL peer certificate verification

* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

* Server certificate:

*       subject: CN=localhost

*       start date: Nov 13 05:28:16 2016 GMT

*       expire date: Nov 11 05:28:16 2026 GMT

*       common name: localhost

*       issuer: CN=localhost

* Server auth using Basic with user 'root'

> GET /login HTTP/1.1

> Authorization: Basic cm9vdDpTdXBlcm1hbjE=

> User-Agent: curl/7.29.0

> Host: 10.10.1.11

> Accept: */*

>

< HTTP/1.1 500 Internal Server Error

< Date: Tue, 14 Feb 2017 15:08:17 GMT

< Content-Type: text/html; charset=utf-8

< Content-Length: 507

< Connection: keep-alive

< Set-Cookie: ECSUI_ERRORS=;Path=/

<

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

   

        Application error

       

   

   

                                       

Oops, an error occured

                       

                                This exception has been logged with id 7289a4jh3.

                       

           

* Connection #0 to host 10.10.1.11 left intact

12 Posts

February 14th, 2017 08:00

DLP prevents me from attaching the log file:

/var/log/vipr/emcvipr-object/ssm.log

12 Posts

February 16th, 2017 05:00

Jason Cwik 

 

Thanks Jason for the help yesterday.  The underlying infrastructure causing the ECS cluster communication issues have been resolved.  Restarting the containers, we're still seeing:

512

 

256

0

 

I was going to build a single node with the latest software as recommended to bridge the gap until ECS appliances are installed and ready to consume. You mentioned an updated release to 3.x.  Can you or others help located the latest Community Edition? Elastic Cloud Storage (ECS) Free Software Download | EMC still reflect 3.0.0 from September.

 

Thanks!! Mike

281 Posts

February 16th, 2017 09:00

Hi Mike,

Please check out the GitHub site, it's always up-to-date: https://community.emc.com/message/968271?et=watches.email.thread#968271

12 Posts

February 16th, 2017 13:00

my mistake. The Readme had mentioned V2 so I didn't trust it.

I did need to change --net=host to --network=host in two places in the step1 script having used Docker 1.13.

Thanks again to those who helped me through this.

No Events found!

Top