When running the single node docker container script step1, the docker container starts only briefly (about 3 seconds) before failing. The script completes but the container is not running. See the attached logs.
The issue here seems to be because of the interface name in your environment not being eth0 as assumed by the script. What it is in your environment, you can find it using ifconfig -a. E.g., in my setup when I saw this problem it was set to ens32. Something we are addressing with the script but for now you could edit the network.json and change all references from eth0 to match the interface name you see in your environment. Once this is done, just rerun the docker run command to successfully finish your step 1!!!
rm: cannot remove `/opt/storageos/logs': Device or resource busy
and
sed: can't read /opt/storageos/conf/storageserver.conf: No such file or directory
Seems like an issue with the underlying disks. Re-ran after updating /host/data/network.json. The container is running but no response. Seeing more errors in addition to those above when running docker logs .
The container log is normal, none of the errors there is related to the issue you experience. Looking at the original log I see there is problem picking the public IP address. Is it possible to let me know what is the network interface? run ifconfig -a and post the result. Also what is the your VM specification? RAM/Core..etc
Also please run docker rm -f $(docker ps -a -q) to make sure you delete all the old container before restarting the process.
The container logs expected to show some error during starting the ECS software itself, these errors are not part of the deployment or the scripts and it is also acceptable. when you run docker ps does it show running instance or nothing?
I'm having the same issue as well, with the same container logs. I also have a CentOS generated NIC name (eno16777984 in this case) which I had to update on the local network.json in the install directory, then comment out the network_file_func function in the step 1 script. After wiping out the data disk and rerunning the script, it said it was successful and the docker container is up, but nothing happened afterwards. Note that even curl directly against localhost fails so I don't think it's a network issue. I had the same container logs as posted earlier.
I tried running that docker rm -f command but it didn't help. Note that the script's docker_cleanup_old_images function already does that.
Jeremy, you mention the AWS hosted CentOS image worked fine. I'm using a mimimal install of CentOS 7.1 from the official CentOS site. Do you have the link to the AWS version? Any way to get a copy of that?
Modifying network.json with the correct NIC got the container running but then I could not connect to the management page. I blew it away and started from scratch. I will let you know what happens now that the step1 script has been updated with the NIC configuration flag. BTW, it all worked flawlessly in an AWS hosted CentOS image.
[root@ecsve1 ecs-multi-node]# ls /data/vnest/vnest-main/configuration/
bootstrap groupmembership.config node-count
[root@ecsve1 ecs-multi-node]#
On the other three nodes, the directory is empty. I also noticed in the vnest.log that it can't send a heartbeat to the other three containers.
Just in case, I did the docker rm command to all nodes and reran the script, but it didn't change. The containers are still up on all nodes, vnest config files are still as above, and vnest.log still has the heartbeat error. I'll contact you on Lync.
The thread was about single node deployment. You are doing multi nodes deployments which is little different. There are more issues to cover in case of multi nodes. We can deployment over Lync
I think the firewall is the main cause for the issue, after debugging the environment everything looks correct. by disable the firewall, I was able to launch and access UI.
thakkd1
13 Posts
1
June 25th, 2015 11:00
HI Jeremy,
The issue here seems to be because of the interface name in your environment not being eth0 as assumed by the script. What it is in your environment, you can find it using ifconfig -a. E.g., in my setup when I saw this problem it was set to ens32. Something we are addressing with the script but for now you could edit the network.json and change all references from eth0 to match the interface name you see in your environment. Once this is done, just rerun the docker run command to successfully finish your step 1!!!
docker run -d -e SS_GENCONFIG=1 -v /ecs:/disks -v /host:/host -v /var/log/vipr/emcvipr-object:/opt/storageos/logs -v /data:/data:rw --net=host emccorp/ecs-software:latest --name=ecsstandalone
jeremykeen
25 Posts
0
June 25th, 2015 12:00
What about the errors:
rm: cannot remove `/opt/storageos/logs': Device or resource busy
and
sed: can't read /opt/storageos/conf/storageserver.conf: No such file or directory
Seems like an issue with the underlying disks. Re-ran after updating /host/data/network.json. The container is running but no response. Seeing more errors in addition to those above when running docker logs .
1 Attachment
morelogs.txt
msalem_cfd6ea
19 Posts
0
June 26th, 2015 11:00
Hello Jeremy,
The container log is normal, none of the errors there is related to the issue you experience. Looking at the original log I see there is problem picking the public IP address. Is it possible to let me know what is the network interface? run ifconfig -a and post the result. Also what is the your VM specification? RAM/Core..etc
Also please run docker rm -f $(docker ps -a -q) to make sure you delete all the old container before restarting the process.
msalem_cfd6ea
19 Posts
0
June 26th, 2015 12:00
The container logs expected to show some error during starting the ECS software itself, these errors are not part of the deployment or the scripts and it is also acceptable. when you run docker ps does it show running instance or nothing?
bacu_artur
5 Posts
0
June 26th, 2015 12:00
I'm having the same issue as well, with the same container logs. I also have a CentOS generated NIC name (eno16777984 in this case) which I had to update on the local network.json in the install directory, then comment out the network_file_func function in the step 1 script. After wiping out the data disk and rerunning the script, it said it was successful and the docker container is up, but nothing happened afterwards. Note that even curl directly against localhost fails so I don't think it's a network issue. I had the same container logs as posted earlier.
I tried running that docker rm -f command but it didn't help. Note that the script's docker_cleanup_old_images function already does that.
Jeremy, you mention the AWS hosted CentOS image worked fine. I'm using a mimimal install of CentOS 7.1 from the official CentOS site. Do you have the link to the AWS version? Any way to get a copy of that?
jeremykeen
25 Posts
0
June 26th, 2015 12:00
Modifying network.json with the correct NIC got the container running but then I could not connect to the management page. I blew it away and started from scratch. I will let you know what happens now that the step1 script has been updated with the NIC configuration flag. BTW, it all worked flawlessly in an AWS hosted CentOS image.
bacu_artur
5 Posts
0
June 26th, 2015 12:00
Yes, it's up:
[root@ecsve1 ecs-multi-node]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a326e61934d0 emccorp/ecs-software:latest "/opt/vipr/boot/boot 49 minutes ago Up 49 minutes ecsmultinode
[root@ecsve1 ecs-multi-node]#
jeremykeen
25 Posts
0
June 26th, 2015 17:00
I browsed to EC2 -> launch instance -> AWS marketplace -> search for CentOS -> CentOS 7 (x86_64) with Updates HVM
msalem_cfd6ea
19 Posts
0
June 26th, 2015 18:00
Bacu,
Good so the container is up and running, we need to check if the ECS is running too. so from the host run ls /data/vnest/vnest-main/configuration
How many files do you see? it should be 5 files otherwise something is wrong. You can reach me on lync and see if I can help looking into the machine.
bacu_artur
5 Posts
0
June 29th, 2015 07:00
On the first node, there are only 3 files:
[root@ecsve1 ecs-multi-node]# ls /data/vnest/vnest-main/configuration/
bootstrap groupmembership.config node-count
[root@ecsve1 ecs-multi-node]#
On the other three nodes, the directory is empty. I also noticed in the vnest.log that it can't send a heartbeat to the other three containers.
Just in case, I did the docker rm command to all nodes and reran the script, but it didn't change. The containers are still up on all nodes, vnest config files are still as above, and vnest.log still has the heartbeat error. I'll contact you on Lync.
msalem_cfd6ea
19 Posts
0
June 29th, 2015 09:00
Bacu,
The thread was about single node deployment. You are doing multi nodes deployments which is little different. There are more issues to cover in case of multi nodes. We can deployment over Lync
msalem_cfd6ea
19 Posts
0
June 29th, 2015 16:00
Artur,
I think the firewall is the main cause for the issue, after debugging the environment everything looks correct. by disable the firewall, I was able to launch and access UI.
Pandamonium1
1 Message
0
October 6th, 2015 22:00
Hello Jeremy, show please command ss -tlpn