Start a Conversation

Unsolved

This post is more than 5 years old

L

1568

February 28th, 2017 07:00

ECS Community Edition 3.0.0 : ECS services on one node fail to start

Hello,

I have installed an ECS CE with 6 nodes, and put the 6 VMs in a vApp. I stopped and started successfully the vApp, but after 3 restarts one of the nodes is seen as not available. When I connect through SSH the docker service is running, the container is running, but netstat -an does not show the TCP ports active (443,4443,3218,...).

As it used to work sometime, I hope there is something wrong that can be fixed on this node. In which logs do I have to look at ?

I figured out that I can run "docker exec --interactive=true --privileged=true ecsmultinode /bin/bash" command to go through the container.

At installation time I created the entry for systemctl to start automatically the container.

Here is the output of journalctl -f -u ecsmultinode on the failing node while starting the container (systemctl start docker.ecsmultinode) :

févr. 28 16:00:52 ecs-nod1 systemd[1]: Starting ECS Multinode Container...

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: ############################### boot operation begin ############################

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: checking 'which' command

févr. 28 16:00:52 ecs-nod1 docker[20704]: which is /usr/bin/which

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: exporting environment variable VIPR_DEPLOYMENT_TYPE to fabric

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: Parsing /host/data/network.json

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: VERSION: 3.0.0.0.86239.1c9e5ec private iface: ens160 public iface: ens160 START VIPR:

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: expecting data partitions to be mounted under /dae

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: StorageServer - setting partitionroot=/dae in /opt/storageos/conf/storageserver.conf

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: StorageServer JSON configuration file is /data/partitions.json

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: partitions.json already exists - will not generate it

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: validating config.iso and fabric version file

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: config.iso is missing in the /data location

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: fetching network information

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: Data IP address of the machine: 10.64.226.87

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: Management IP address of the machine: 10.64.226.87

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: Geo IP address of the machine: 10.64.226.87

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: NETMASK found: 255.255.255.0

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: Created /opt/storageos/conf/network

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: IPADDR='10.64.226.87'

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: All validations are complete. Going ahead with the configuration steps.

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: populating vipr shared library path

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: /opt/storageos/lib is already populated

févr. 28 16:00:52 ecs-nod1 docker[20704]: 02/28/17 15:00:52: creating scsi device nodes

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: creating volumes skeleton containers

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: populating vipr network info in fabric identifier

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: fixing svcuser account permissions

févr. 28 16:01:02 ecs-nod1 docker[20704]: chown: invalid user: 'svcuser:users'

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: updating storageos property to datanode

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: fixing file permissions

févr. 28 16:01:02 ecs-nod1 docker[20704]: 02/28/17 15:01:02: fixing storageos and data ownership

févr. 28 16:01:03 ecs-nod1 docker[20704]: 02/28/17 15:01:03: creating place holders file

févr. 28 16:01:03 ecs-nod1 docker[20704]: 02/28/17 15:01:03: preparing vipr configuration information

févr. 28 16:01:03 ecs-nod1 docker[20704]: cp: cannot stat '/data/config.iso': No such file or directory

févr. 28 16:01:03 ecs-nod1 docker[20704]: 02/28/17 15:01:03: Starting the rpcbind service

févr. 28 16:01:03 ecs-nod1 docker[20704]: 02/28/17 15:01:03: Initializing configuration properties

févr. 28 16:01:03 ecs-nod1 docker[20704]: 02/28/17 15:01:03: Generating configuration files

févr. 28 16:01:03 ecs-nod1 docker[20704]: /etc/systool: Warning: Fabric deployment. Skipping _bootfs_mount_ro()

févr. 28 16:01:08 ecs-nod1 docker[20704]: 02/28/17 15:01:08: Generation failed

févr. 28 16:01:08 ecs-nod1 docker[20704]: Starting nginx service

févr. 28 16:01:08 ecs-nod1 docker[20704]: ..done

févr. 28 16:01:08 ecs-nod1 docker[20704]: mkfifo: cannot create fifo '/var/run/.boot_sh_pipe': File exists

Here is now the output of a functional node:

févr. 28 16:05:23 ecs-nod2 systemd[1]: Starting ECS Multinode Container...

févr. 28 16:05:24 ecs-nod2 docker[3746]: 02/28/17 15:05:24: ############################### boot operation begin ############################

févr. 28 16:05:24 ecs-nod2 docker[3746]: 02/28/17 15:05:24: checking 'which' command

févr. 28 16:05:24 ecs-nod2 docker[3746]: which is /usr/bin/which

févr. 28 16:05:24 ecs-nod2 docker[3746]: 02/28/17 15:05:24: exporting environment variable VIPR_DEPLOYMENT_TYPE to fabric

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: Parsing /host/data/network.json

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: VERSION: 3.0.0.0.86239.1c9e5ec private iface: ens160 public iface: ens160 START VIPR:

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: expecting data partitions to be mounted under /dae

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: StorageServer - setting partitionroot=/dae in /opt/storageos/conf/storageserver.conf

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: StorageServer JSON configuration file is /data/partitions.json

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: partitions.json already exists - will not generate it

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: validating config.iso and fabric version file

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: config.iso is missing in the /data location

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: fetching network information

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: Data IP address of the machine: 10.64.226.54

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: Management IP address of the machine: 10.64.226.54

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: Geo IP address of the machine: 10.64.226.54

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: NETMASK found: 255.255.255.0

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: Created /opt/storageos/conf/network

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: IPADDR='10.64.226.54'

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: All validations are complete. Going ahead with the configuration steps.

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: populating vipr shared library path

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: /opt/storageos/lib is already populated

févr. 28 16:05:25 ecs-nod2 docker[3746]: 02/28/17 15:05:25: creating scsi device nodes

févr. 28 16:05:36 ecs-nod2 docker[3746]: 02/28/17 15:05:36: creating volumes skeleton containers

févr. 28 16:05:36 ecs-nod2 docker[3746]: 02/28/17 15:05:36: populating vipr network info in fabric identifier

févr. 28 16:05:37 ecs-nod2 docker[3746]: 02/28/17 15:05:37: fixing svcuser account permissions

févr. 28 16:05:37 ecs-nod2 docker[3746]: chown: invalid user: 'svcuser:users'

févr. 28 16:05:37 ecs-nod2 docker[3746]: 02/28/17 15:05:37: updating storageos property to datanode

févr. 28 16:05:37 ecs-nod2 docker[3746]: 02/28/17 15:05:37: fixing file permissions

févr. 28 16:05:37 ecs-nod2 docker[3746]: 02/28/17 15:05:37: fixing storageos and data ownership

févr. 28 16:05:38 ecs-nod2 docker[3746]: 02/28/17 15:05:38: creating place holders file

févr. 28 16:05:38 ecs-nod2 docker[3746]: 02/28/17 15:05:38: preparing vipr configuration information

févr. 28 16:05:38 ecs-nod2 docker[3746]: cp: cannot stat '/data/config.iso': No such file or directory

févr. 28 16:05:38 ecs-nod2 docker[3746]: 02/28/17 15:05:38: Starting the rpcbind service

févr. 28 16:05:39 ecs-nod2 docker[3746]: 02/28/17 15:05:39: Initializing configuration properties

févr. 28 16:05:39 ecs-nod2 docker[3746]: 02/28/17 15:05:39: Generating configuration files

févr. 28 16:05:39 ecs-nod2 docker[3746]: /etc/systool: Warning: Fabric deployment. Skipping _bootfs_mount_ro()

févr. 28 16:05:45 ecs-nod2 docker[3746]: 02/28/17 15:05:45: Generation failed

févr. 28 16:05:46 ecs-nod2 docker[3746]: /etc/init.d/storageos-dataservice: line 100: /proc/sys/kernel/core_pattern: Read-only file system

févr. 28 16:05:46 ecs-nod2 docker[3746]: Warning: Found stale pidfile(s) (unclean shutdown?)

févr. 28 16:05:49 ecs-nod2 docker[3746]: Setting up SSL certificates  ...done

févr. 28 16:05:49 ecs-nod2 docker[3746]: Starting ViPR services..done

févr. 28 16:05:51 ecs-nod2 docker[3746]: service: no such service cron

févr. 28 16:05:51 ecs-nod2 docker[3746]: Starting nginx service

févr. 28 16:05:51 ecs-nod2 docker[3746]: ..done

févr. 28 16:05:52 ecs-nod2 docker[3746]: mkfifo: cannot create fifo '/var/run/.boot_sh_pipe': File exists

The difference is that the lines in bold are not shown on the failing node....

Any ideas ?

Thanks for your help

Regards

Laurent

July 21st, 2017 12:00

The node likely became corrupted when it did not shutdown properly.  The previous installer did not have good support for starting and stopping the ECS Docker container.  Please try out the new installer which has superior support for multinode installations, systemd, and firewalld.

GitHub - EMCECS/ECS-CommunityEdition: ECS Community Edition "Free & Frictionless"

No Events found!

Top