Telstra1

6 Posts

4303

May 18th, 2016 21:00

ECS Multinode install fails at the autenitication step

Hi I am trying to install ECS multimode cluster (v 2.2.x) virtual edition and I have installed CentOS7 with docker service. When I run the Python script to do the step 1 of the installation, it proceeds up to the point of authentication. I have a proxy server configured in the host which was throwing a 403 error. I disabled the proxy and firewalld I am still getting the error attached. i.e. connection refused. Can someone help with finding the solution? Thanks

1 Attachment

ECS virtual multinode install error.PNG.png

Responses(16)

JasonCwik

281 Posts

0

May 19th, 2016 08:00

This "error" is expected. We're polling the login service as a wait loop until ECS is ready. Depending on the speed of your system, this can take up to 30 minutes. Using --use-urandom can sometimes speed this up on certain VM environments with low entropy.

Telstra1

6 Posts

0

May 19th, 2016 19:00

Hi Thanks. Where is the --use-urandom option used? I tried using it as an option while running the python script. It was unrecognised.

Telstra1

6 Posts

0

May 20th, 2016 00:00

Hi Thanks the authentication step took more that 30 minutes and completed step 1but I cannot access the website https://ecs-node from the local node itself. Comes up with a conection refused error.

Any help?

Thanks

Neela

Hen2

11 Posts

0

May 20th, 2016 03:00

When the script successfully finished, you should see an authentication token. Did you get it ?

That happens when after many retries to be authenticated, installation fails. So we just need to restart it and try again.

Some preparation steps are needed on all nodes:

# rm -rf /data

# umount /ecs/uuid-1/

# fdisk /dev/sdb (or disk used in your installation)

next fdisk commands:

p - check the existing partition

d - delete the partition

p - check if the partition was deleted

w - write and exit

After that, you can start step1 installation on the 1st node. After a little bit, start installation on the remaining nodes.

Be patient, Problem reaching authentication server takes some time.

In my environment, that works fine.

JasonCwik

281 Posts

0

May 20th, 2016 07:00

Which version of the install script are you running? Should be in 2.2.0.3 or later. You can pass it to the step1 script.

Telstra1

6 Posts

0

May 22nd, 2016 17:00

I’ m not sure whether it has successfully finished Step 1 since I did not get the auth token. After about 30 mins since the script started giving authentication errors, it finished but no auth token. Any ideas. I am running the script again. Version of ECS is 2.2.0.3.

Thanks

Aaron_Peterson

26 Posts

0

May 23rd, 2016 09:00

Since you're running a multiple-node installation, you might be seeing an issue wherein the coordinator service is distrustful of other nodes in the system. With firewalld enabled, you can resolve this by
using the following commands for each node on each host:

firewall-cmd --permanent --zone=trusted --add-source= /32

followed by firewall-cmd --reboot on each host.

dbbyleo

2 Posts

0

June 29th, 2016 13:00

Did you ever resolve this issue? I am experiencing the same thing. I have stopped firewalld on all nodes. I can't even intall on the first node. I can however install the using the single-node script, but not the multi-node script.

shegarty1

24 Posts

0

August 29th, 2016 03:00

Hi Vasily,

I tried the workaround you specified but that did not work in my instance. Any other ideas would be appreciated.

[29/Aug/2016 10:23:25] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Executing getAuthToken: curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[29/Aug/2016 10:23:55] INFO [root:638] Problem reaching authentication server. Retrying shortly.

[29/Aug/2016 10:23:55] CRITICAL [root:641] Authentication service not yet started.

[29/Aug/2016 10:23:55] INFO [root:770] Step 1 Completed. Navigate to the administrator website that is available from any of the ECS data nodes. The ECS administrative portal can be accessed from port 443. For example: https://ecs-node-external-ip-address. The website may take a few minutes to become available.

[root@ecs-n1 ~]#

[root@ecs-n1 ~]# curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[root@ecs-n1 ~]#

Regards,

Stephen

dbbyleo

2 Posts

0

August 29th, 2016 06:00

Did you turn off the firewall? For example:

# systemctl stop firewalld

# systemctl disable firewalld

shegarty1

24 Posts

0

September 1st, 2016 01:00

Yes I had disabled the firewall before reinstalling per Vasily's workaround above.

Aaron_Peterson

26 Posts

0

September 2nd, 2016 08:00

As you're seeing connection refused rather than connection failed, I'd guess it either has to do with your network setup or something else already being bound to port 80/443/4443. Is there any service you might be running on the VM that would already those ports bound? (such some other docker container, ADGDevKit, etc.)

shegarty1

24 Posts

0

September 8th, 2016 08:00

Thanks for getting back to Aaron.

I checked for services listening on the 3 ports you mentioned:

[root@ecs-n1 ~]# netstat -nltp | grep 80

tcp6 0 0 :::80 :::* LISTEN 10674/httpd

[root@ecs-n1 ~]# netstat -nltp | grep 443

tcp6 0 0 127.0.0.1:7443 :::* LISTEN 19847/authsvc

tcp6 0 0 127.0.0.1:64443 :::* LISTEN 19811/ecsportalsvc

[root@ecs-n1 ~]#

I an not using IPv6 however.

shegarty1

24 Posts

0

September 14th, 2016 05:00

Hi Aaron,

I deployed 4 new CentOS nodes -- Compute nodes instead of the basic install. I have disabled the firewall and also installed and started both httpd and docker services. I have included the certificate for being inside the EMC firewall.

I ran the install script which reported the exact same issues:

Issue #1: setuptools:

[14/Sep/2016 10:42:33] INFO [root:586] Adding python setuptools to container

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 12402 100 12402 0 0 1384 0 0:00:08 0:00:08 --:--:-- 3598

Traceback (most recent call last):

File "ez_setup.py", line 426, in

sys.exit(main())

File "ez_setup.py", line 422, in main

archive = download_setuptools(**_download_args(options))

File "ez_setup.py", line 337, in download_setuptools

When I check if pyjthon setup tools is installed it appears to be installed already:

[root@ecs-n1 ~]# yum list | grep python | grep -i setup

cryptsetup-python.x86_64 1.6.7-1.el7 base

python-setuptools.noarch 0.9.8-4.el7 base

[root@ecs-n1 ~]#

*******************************************

Issue #2: The authentication server not running and nothing listening on port 4443.

[14/Sep/2016 10:58:29] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Executing getAuthToken: curl -i -k https://10.64.18.108:4443/login -u root:ChangeMe

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 10.64.18.108:4443; Connection refused

[14/Sep/2016 10:58:59] INFO [root:638] Problem reaching authentication server. Retrying shortly.

Any feedback would be appreciated.

Regards,

Stephen

Aaron_Peterson

26 Posts

0

September 14th, 2016 08:00

Stephen,

The first error corresponds to installing setuptools within the docker container, which is only necessary if you intend on using the ECS CLI from within the container. It can always be manually installed in the normal fashion.

The second is expected at least a few times at the end of the first step, hence the notice of "Retrying shortly.", but it sounds like the nodes aren't coming up even after 10-20 minutes? Installing httpd shouldn't be necessary for CE, but it also shouldn't be interfering over port 4443 unless it's been specifically bound there...

Please feel free to e-mail me if you'd like to take a deeper look at your system, including sending across some logs, as I think that might be the most effective way to discern the root cause.

- Aaron

1
2

View All

No Events found!