Highlighted
MikePhelps
1 Nickel

unresponsive ECS multinode cluster

I've posted before about this 4 node cluster where the login screen was unresponsive.  Today, things are different in that the system isn't serving requests for S3 or Web UI logins.  The cluster has been running for about three months now, and other than having to kill the login process twice on each node over this time frame, it's been fine.  We can't log into the web UI or over Cyberduck / S3browser either.

ecs1.JPG.jpg

quantity of unready seems to vary.  Please advise!! Thanks.

<hostname>:9101/stats/dt/DTInitStat/ 

<entry>

<total_dt_num>512</total_dt_num>

<unready_dt_num>256</unready_dt_num>

<unknown_dt_num>0</unknown_dt_num>

</entry>

<entry>

<type>RR</type>

<level>0</level>

<total_dt_num>32</total_dt_num>

<unready_dt_num>32</unready_dt_num>

<ObjectControllerException.ERROR_DIRECTORYTABLE_NOT_INITIALIZED>

[urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_17_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_30_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_27_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_22_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_6_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_1_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_12_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_14_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_8_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_29_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_3_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_19_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_24_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_5_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_11_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_21_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_31_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_0_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_16_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_26_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_2_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_15_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_7_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_23_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_9_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_13_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_25_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_28_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_4_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_18_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_10_32_0:, urn:storageosSmiley SurprisedwnershipInfo:8f80e02e-213a-4dd8-bec8-daaaa8c08694__RR_20_32_0:]

</ObjectControllerException.ERROR_DIRECTORYTABLE_NOT_INITIALIZED>

</entry>



Tags (3)
0 Kudos
9 Replies
JasonCwik
2 Iron

Re: unresponsive ECS multinode cluster

Hi Mike,

How much space did you provision your system with?  You may have run out of space.

0 Kudos
MikePhelps
1 Nickel

Re: unresponsive ECS multinode cluster

each of the four VMs have two 500GB volumes attached.   It shows as full from the OS side, but hoping that's just because it's all under control of ECS for use?

/dev/vdb1      524030980 513835172  10195808  99% /ecs/uuid-07dd84a3-1a30-430f-b8cc-271dfbd62450

/dev/vdc1      524030980 513835172  10195808  99% /ecs/uuid-bb45e59a-6c10-4e01-abc6-cb1f96b2bd16

0 Kudos
coneryj
1 Nickel

Re: unresponsive ECS multinode cluster

I do believe that that is just due to ECS' reserved use.

I see the same on a functioning ECS:

ecs-obj-1-1:/ # df -h

Filesystem                                                                                           Size  Used Avail Use% Mounted on

/dev/mapper/docker-254:0-671088768-2591951dfe0f5f49fc6fdfc9c9106b50209a5f575a7e6254ab8921d9860e075c   99G  1.6G   92G   2% /

tmpfs                                                                                                 31G     0   31G   0% /dev

tmpfs                                                                                                 31G     0   31G   0% /sys/fs/cgroup

/dev/mapper/ECS-LVRoot                                                                               179G  133G   46G  75% /data

tmpfs                                                                                                 31G     0   31G   0% /run/secrets

tmpfs                                                                                                 31G  3.2G   28G  11% /var/run/docker.sock

shm                                                                                                   64M     0   64M   0% /dev/shm

/dev/sde1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-e2a4e90c-9621-4d92-b17e-0236ba0910d8

/dev/sdh1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-34c123a7-bd48-4163-bc99-078965a05d03

/dev/sdl1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-6e66afea-5a20-471e-a0c6-ba89080ca227

/dev/sdn1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-d2be46b9-651f-46c8-a335-cdf3d21914c9

/dev/sdd1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-08915080-a5f8-4a4c-87af-3092d142c29e

/dev/sdp1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-fba72158-a671-4b7b-b516-8ca053338aba

/dev/sdm1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-f2e84e77-c6ab-4a05-92d6-c962452d0407

/dev/sdq1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-7d23e002-0df2-4e2a-91bb-635dc0c03e94

/dev/sdr1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-14a67c53-e841-4bc2-bd4f-a069d66f3964

/dev/sdk1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-c6b3695c-2fba-4756-8b19-3a08a5aafbd0

/dev/sdo1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-0abbe733-ffae-4529-afc0-fe7798548260

/dev/sdg1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-a18f8fad-e97e-4abf-9c2d-088866709b3a

/dev/sdi1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-50434d25-38c9-4e48-92be-f85feaa95ec1

/dev/sdj1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-ebbdb700-208d-4dd4-ba32-283ca9c741ad

/dev/sdf1                                                                                            5.5T  5.5T  7.1G 100% /dae/uuid-846a8f9e-74c9-4e62-992f-9bfc238957e6

0 Kudos
coneryj
1 Nickel

Re: unresponsive ECS multinode cluster

are you able to use the mgmt api?

If so, login and retrieve a cookie:

curl -s -k -v -u root:ChangeMe https://<ecs node ip address>/login


Then use this token in the X-SDS-AUTH-TOKEN header of the rest request. If you're on a *nix system, the easiest way is to export a shell var and use that:

export TOKEN=X-SDS-AUTH-TOKEN:<the token>


curl -vvv -sk -H $TOKEN https://10.1.83.51:4443/object/capacity




also if you could send the ssm.log file. That would be helpful.

It's in /opt/storageos/logs/

MikePhelps
1 Nickel

Re: Re: unresponsive ECS multinode cluster

* About to connect() to 10.10.1.11 port 443 (#0)

*   Trying 10.10.1.11...

* Connected to 10.10.1.11 (10.10.1.11) port 443 (#0)

* Initializing NSS with certpath: sql:/etc/pki/nssdb

* skipping SSL peer certificate verification

* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

* Server certificate:

*       subject: CN=localhost

*       start date: Nov 13 05:28:16 2016 GMT

*       expire date: Nov 11 05:28:16 2026 GMT

*       common name: localhost

*       issuer: CN=localhost

* Server auth using Basic with user 'root'

> GET /login HTTP/1.1

> Authorization: Basic cm9vdDpTdXBlcm1hbjE=

> User-Agent: curl/7.29.0

> Host: 10.10.1.11

> Accept: */*

>

< HTTP/1.1 500 Internal Server Error

< Date: Tue, 14 Feb 2017 15:08:17 GMT

< Content-Type: text/html; charset=utf-8

< Content-Length: 507

< Connection: keep-alive

< Set-Cookie: ECSUI_ERRORS=;Path=/

<

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

    <head>

        <title>Application error</title>

        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    </head>

    <body>

                                        <h1>Oops, an error occured</h1>

                        <p>

                                This exception has been logged with id <strong>7289a4jh3</strong>.

                        </p>

            </body>

</html>

* Connection #0 to host 10.10.1.11 left intact

0 Kudos
MikePhelps
1 Nickel

Re: unresponsive ECS multinode cluster

DLP prevents me from attaching the log file:

/var/log/vipr/emcvipr-object/ssm.log

0 Kudos
MikePhelps
1 Nickel

Re: unresponsive ECS multinode cluster

Jason Cwik

Thanks Jason for the help yesterday.  The underlying infrastructure causing the ECS cluster communication issues have been resolved.  Restarting the containers, we're still seeing:

<total_dt_num>512</total_dt_num>

<unready_dt_num>256</unready_dt_num>

<unknown_dt_num>0</unknown_dt_num>

</entry>


I was going to build a single node with the latest software as recommended to bridge the gap until ECS appliances are installed and ready to consume. You mentioned an updated release to 3.x.  Can you or others help located the latest Community Edition? Elastic Cloud Storage (ECS) Free Software Download | EMC still reflect 3.0.0 from September.


Thanks!! Mike

0 Kudos
JasonCwik
2 Iron

Re: unresponsive ECS multinode cluster

Hi Mike,

Please check out the GitHub site, it's always up-to-date: https://community.emc.com/message/968271?et=watches.email.thread#968271

0 Kudos
MikePhelps
1 Nickel

Re: unresponsive ECS multinode cluster

my mistake. The Readme had mentioned V2 so I didn't trust it.

I did need to change --net=host to --network=host in two places in the step1 script having used Docker 1.13.

Thanks again to those who helped me through this.

0 Kudos