Start a Conversation

Unsolved

This post is more than 5 years old

4303

May 18th, 2016 21:00

ECS Multinode install fails at the autenitication step

Hi I am trying to install ECS multimode cluster (v 2.2.x) virtual edition and I have installed CentOS7 with docker service. When I run the Python script to do the step 1 of the installation, it proceeds up to the point of authentication. I have a proxy server configured in the host which was throwing a 403 error. I disabled the proxy and firewalld I am still getting the error attached. i.e. connection refused. Can someone help with finding the solution? Thanks

1 Attachment

281 Posts

May 19th, 2016 08:00

This "error" is expected.  We're polling the login service as a wait loop until ECS is ready.  Depending on the speed of your system, this can take up to 30 minutes.  Using --use-urandom can sometimes speed this up on certain VM environments with low entropy.

6 Posts

May 19th, 2016 19:00

Hi Thanks. Where is the --use-urandom option used? I tried using it as an option while running the python script. It was unrecognised.

6 Posts

May 20th, 2016 00:00

Hi Thanks the authentication step took more that 30 minutes and completed step 1but I cannot access the website https://ecs-node from the local node itself. Comes up with a conection refused error.

Any help?

Thanks

Neela

11 Posts

May 20th, 2016 03:00

When the script successfully finished, you should see an authentication token. Did you get it ?

That happens when after many retries to be authenticated, installation fails. So we just need to restart it and try again.

Some preparation steps are needed on all nodes:

# rm -rf /data

# umount /ecs/uuid-1/

# fdisk /dev/sdb  (or disk used in your installation)

next fdisk commands:

p - check the existing partition

d - delete the partition

p - check if the partition was deleted

w - write and exit

After that, you can start step1 installation on the 1st node. After a little bit, start installation on the remaining nodes.

Be patient, Problem reaching authentication server takes some time.

In my environment, that works fine.

281 Posts

May 20th, 2016 07:00

Which version of the install script are you running?  Should be in 2.2.0.3 or later.  You can pass it to the step1 script.

6 Posts

May 22nd, 2016 17:00

I’ m not sure whether it has successfully finished Step 1 since I did not get the auth token. After about 30 mins since the script started giving authentication errors, it finished but no auth token. Any ideas. I am running the script again. Version of ECS is 2.2.0.3.

Thanks

May 23rd, 2016 09:00

Since you're running a multiple-node installation, you might be seeing an issue wherein the coordinator service is distrustful of other nodes in the system. With firewalld enabled, you can resolve this by
using the following commands for each node on each host:

firewall-cmd --permanent --zone=trusted --add-source= /32

followed by firewall-cmd --reboot on each host.

2 Posts

June 29th, 2016 13:00

Did you ever resolve this issue?  I am experiencing the same thing.  I have stopped firewalld on all nodes.  I can't even intall on the first node.  I can however install the using the single-node script, but not the multi-node script. 

24 Posts

August 29th, 2016 03:00

Hi Vasily,

I tried the workaround you specified but that did not work in my instance.  Any other ideas would be appreciated.

[29/Aug/2016 10:23:25] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Executing getAuthToken: curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[29/Aug/2016 10:23:55] INFO [root:638] Problem reaching authentication server. Retrying shortly.

[29/Aug/2016 10:23:55] CRITICAL [root:641] Authentication service not yet started.

[29/Aug/2016 10:23:55] INFO [root:770] Step 1 Completed.  Navigate to the administrator website that is available from any of the ECS data nodes.         The ECS administrative portal can be accessed from port 443. For example: https://ecs-node-external-ip-address.         The website may take a few minutes to become available.

[root@ecs-n1 ~]#

[root@ecs-n1 ~]# curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[root@ecs-n1 ~]#

Regards,

Stephen

2 Posts

August 29th, 2016 06:00

Did you turn off the firewall?  For example:

# systemctl stop firewalld

# systemctl disable firewalld

24 Posts

September 1st, 2016 01:00

Yes I had disabled the firewall before reinstalling per Vasily's workaround above.

September 2nd, 2016 08:00

As you're seeing connection refused rather than connection failed, I'd guess it either has to do with your network setup or something else already being bound to port 80/443/4443. Is there any service you might be running on the VM that would already those ports bound? (such some other docker container, ADGDevKit, etc.)

24 Posts

September 8th, 2016 08:00

Thanks for getting back to Aaron.

I checked for services listening on the 3 ports you mentioned:

[root@ecs-n1 ~]# netstat -nltp | grep 80

tcp6       0      0 :::80                   :::*                    LISTEN      10674/httpd

[root@ecs-n1 ~]# netstat -nltp | grep 443

tcp6       0      0 127.0.0.1:7443          :::*                    LISTEN      19847/authsvc

tcp6       0      0 127.0.0.1:64443         :::*                    LISTEN      19811/ecsportalsvc

[root@ecs-n1 ~]#

I an not using IPv6 however.

24 Posts

September 14th, 2016 05:00

Hi Aaron,

I deployed 4 new CentOS nodes -- Compute nodes instead of the basic install.  I have disabled the firewall and also installed and started both httpd and docker services.  I have included the certificate for being inside the EMC firewall.

I ran the install script which reported the exact same issues:

Issue #1:  setuptools:

[14/Sep/2016 10:42:33] INFO [root:586] Adding python setuptools to container

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 12402  100 12402    0     0   1384      0  0:00:08  0:00:08 --:--:--  3598

Traceback (most recent call last):

  File "ez_setup.py", line 426, in

    sys.exit(main())

  File "ez_setup.py", line 422, in main

    archive = download_setuptools(**_download_args(options))

  File "ez_setup.py", line 337, in download_setuptools

When I check if pyjthon setup tools is installed it appears to be installed already:

[root@ecs-n1 ~]# yum list | grep python | grep -i setup

cryptsetup-python.x86_64                   1.6.7-1.el7                 base

python-setuptools.noarch                   0.9.8-4.el7                 base

[root@ecs-n1 ~]#

*******************************************

Issue #2:  The authentication server not running and nothing listening on port 4443.

[14/Sep/2016 10:58:29] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Executing getAuthToken: curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[14/Sep/2016 10:58:59] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Any feedback would be appreciated.

Regards,

Stephen

September 14th, 2016 08:00

Stephen,

The first error corresponds to installing setuptools within the docker container, which is only necessary if you intend on using the ECS CLI from within the container. It can always be manually installed in the normal fashion.

The second is expected at least a few times at the end of the first step, hence the notice of "Retrying shortly.", but it sounds like the nodes aren't coming up even after 10-20 minutes? Installing httpd shouldn't be necessary for CE, but it also shouldn't be interfering over port 4443 unless it's been specifically bound there...

Please feel free to e-mail me if you'd like to take a deeper look at your system, including sending across some logs, as I think that might be the most effective way to discern the root cause.

- Aaron

No Events found!

Top