This post is more than 5 years old
1 Rookie
•
9 Posts
1
4412
November 16th, 2015 10:00
Can't create Virtual Data Center in ECS Community Edition
Hi all,
I'm having a problem with my recently built ECS Single-Node Community Edition.
I'm running on CentOS 7.1.1503, Docker version 1.4.1, and ECS version 2.1.0.0
When I try to create a Virtual Data Center via the GUI, the error is:
"Error 7000 (http: 500): An error occurred in the API Service. An error occurred in the API Service. Cause: error insertVdcInfo. Virtual Data Center creation failure may occur when Data Services has not completed initialization. Please retry after a few minutes."
When I try this, the virtual machine's CPU usage hits 100%, and stays there.
I have also tried via the command line, using the "step2" python script, with a similar result (I didn't record the error, but will recreate if it's important).
Has anyone else had the same difficulty? More importantly, does anyone know how to fix this, or at least where I can look to see what the actual problem is?
Cheers,
Ben


HackySack
1 Rookie
•
9 Posts
0
November 18th, 2015 08:00
Solved!
This is mainly for the Googlers that hit this page with the same errors.
How I solved my problem with ECS 2.1 single node deployment:
docker_image_name = "{}:{}".format(imagename, imagetag)
to
docker_image_name = "{}:{}".format(args.imagename, args.imagetag)
time.sleep(20 * 60)
to
time.sleep(180 * 60)
or,
That's it. Now, no doubt the good folk at EMC will sort out the problems with the step1 script soon, so you may not need any of this, other than the wait bit in step2. That is related closely to the size of your data disk.
Another tip that I found along the way. If you're playing around/suffering with failed deployments and find you are removing & re-deploying the container multiple times, you can achieve this without having to continually re-download the Docker package & ECS image. To do this, in the Step1 script, simply comment out the following lines:
Good luck
Ben
1 Attachment
step1_ecs_singlenode_install.py
HackySack
1 Rookie
•
9 Posts
0
November 17th, 2015 02:00
OK, fixed this problem temporarily, but now I'm back to square one..
The VDC error went away once I fixed a DNS issue for this host. Once the host was able to resolve itself from the upstream DNS, I was able to create the VDC. At least, I think the two issues were related, or I may have just gotten lucky with the timing. I think this may have been caused by me using a hyphen in the hostname, as the Step1 script was adding some weird entries into the hosts file. Hyphen removed, no more weird entries.
Anyway, I got my ECS up & running, but then foolishly rebooted the host. After that, I could no longer log into the GUI. API login seems to work OK (curl -i -k https://blahblah:4443 -u root:ChangeMe), but takes ~5 minutes to return. That doesn't feel right.
If I try to create the VDC from the Command Line, I get this: "The Data Services system is being initialized. Please try again later." I've waited over an hour since the Storage Pool create, which is 3x longer than suggested in the doco.
So far, I'm not having much luck with ECS.
Aaron_Peterson
1 Rookie
•
26 Posts
1
November 17th, 2015 18:00
We've seen this issue in the single-node as a result of a directory table setting not being correctly scaled for fewer than 3 nodes. Try these steps to see if it resolves the authentication problem:
1. Enter the container using nsenter -i -m -p -u -t `pidof blobsvc`
2. Clear all previous VNest files using sudo rm -rf /data/vnest/vnest-main/*
3. Edit the value of "object.NumDirectoriesPerCoSForSystemDT" and "object.NumDirectoriesPerCoSForUserDT" from 128 to 32 in the file
/opt/storageos/conf/common.object.properties
4. Restart the portalsvc service or just exit the container and reboot the node entirely.
5. When node restarts, login and start the docker services:
service docker start
docker start ecsstandalone
6. Wait a couple of minutes for the container to initialize, then attempt to log in with curl again to ensure the authsvc service is started. If authentication completes successfully, you should be able to log into the GUI and run the provisioning script (step2_object_provisioning.py).
HackySack
1 Rookie
•
9 Posts
0
November 18th, 2015 00:00
Hi Aaron,
Thanks for looking at this for me. I've just tried that & received the following response:
HTTP/1.1 503 Service Temporarily Unavailable
Date: Wed, 18 Nov 2015 08:22:55 GMT
Content-Type: text/xml
Content-Length: 373
Connection: keep-alive
ETag: "5637b05b-175"
6503I've not seen that error before, but I'll do as it asks & try again later. As a positive, it responds much quicker.
I should note that authentication works OK immediately following the deployment of the container. Once I create the Storage Pool though, it then gets really, really slow. The Curl call still returns a token, but it takes ~5 minutes for that to happen.
I did see that change to the NumDirectories property in the step1 python deployment script, however I think I've spotted a problem. The current version on GitHub has this line to copy the existing file from the container (line 470):
logger.info("Copy object properties files to host")
os.system(
"docker exec -t ecsstandalone cp /opt/storageos/conf/cm.object.properties /host/cm.object.properties1")
The script then tries to alter the file, but it's using a different name (Line 482):
logger.info("Modify Directory Table config for single node")
os.system(
"sed --expression='s/object.NumDirectoriesPerCoSForSystemDT=128/object.NumDirectoriesPerCoSForSystemDT=32/' --expression='s/object.NumDirectoriesPerCoSForUserDT=128/object.NumDirectoriesPerCoSForUserDT=32/' < /host/common.object.properties1 > /host/common.object.properties")
Does that look like intended behaviour?
One more thing. The step1 script does a good job of partitioning the data disk, but doesn't add it to fstab. If you reboot the host, that's going to cause some issues for the container (I found this out the hard way). It might seem obvious to seasoned Linux people, but not so much for the rest of us. Maybe something to mention in the instructions, if it's not going to be included in the script?
Cheers,
Ben
HackySack
1 Rookie
•
9 Posts
0
November 18th, 2015 04:00
After hitting the same problems over & over again, I tried something a little different. I rebuilt the CentOS host from scratch, and then re-installed the ECS stuff. Same problem after the re-build.
This entry, from the ssm.log, looks relevant:
2015-11-18 10:19:46,471 [TaskScheduler-SSManager-BFW-ParallelExecutor-000] ERROR ChunkClientImpl.java (line 167) chunk creation failed 2092b6bb-2a84-4892-be0a-06219b9b9dbc, write context: [writerId: 10.0.0.12-1095-DIRECTORY_TABLE-BFW-0-writer-0, cos: urn:storageos:VirtualArray:4ceee845-9a9f-45a9-8866-13d0bb875557, level: 1, type: DIRECTORY_TABLE, skipChunkSequenceCounter: false], error com.emc.storageos.data.object.exception.ObjectControllerException: create chunk 2092b6bb-2a84-4892-be0a-06219b9b9dbc failed, request id 0a00000c:15119e687d1:18f:458, serverIp 10.0.0.12, status ERROR_NO_STORAGE_DEVICE_FOUND sleep 5000 millis before retry
That error shows up every 5 seconds or so (which you'd expect from the 5000 millis back-off)
I've checked the mount, and my data disk is partitioned, formatted & mounted. How do I check to see if the container can find it?
JasonCwik
281 Posts
0
November 19th, 2015 13:00
Thanks Ben. We'll make sure to update the scripts with your changes.