This post is more than 5 years old
12 Posts
0
3457
ECS 3 CE single node services unavailable / root file system full
I have seen mention of out of space issues as well as unresponsive web UI and services. I exec'd into the container to find that root was full.
ecs0:/tmp # ls -lsa
total 52
0 drwxrwxrwt 5 root root 192 Mar 20 19:45 .
4 drwxr-xr-x 28 root root 4096 Feb 16 19:14 ..
0 -rw-r--r-- 1 storageos storageos 0 Mar 20 19:17 FPVMhealthcheck.heartbeat
0 -rw-r--r-- 1 root root 0 Mar 20 19:17 certtool.lock
0 drwxr-xr-x 2 root root 19 Mar 20 19:52 hsperfdata_root
0 drwxr-xr-x 2 storageos storageos 32 Mar 20 19:52 hsperfdata_storageos
0 drwx------ 2 root root 94 Feb 16 19:30 run-crons.6yRf5Z
48 -rwxr-xr-x 1 storageos storageos 48432 Feb 16 19:55 snappy-1.0.5-libsnappyjava.so
0 -rw-r--r-- 1 root root 0 Mar 20 19:17 systool.lock
ecs0:/tmp # cd run-crons.6yRf5Z/
ecs0:/tmp/run-crons.6yRf5Z # ls -la
total 8898796
drwx------ 2 root root 94 Feb 16 19:30 .
drwxrwxrwt 5 root root 192 Mar 20 19:45 ..
-rw-r--r-- 1 root root 9112358912 Mar 20 19:17 run-crons.hourly.16288
-rw-r--r-- 1 root root 32 Feb 16 19:30 run-crons_mail.16288
-rw-r--r-- 1 root root 0 Feb 16 19:30 run-crons_output.16288
ecs0:/tmp/run-crons.6yRf5Z # file run-crons.hourly.16288
run-crons.hourly.16288: ASCII text
ecs0:/tmp/run-crons.6yRf5Z # head run-crons.hourly.16288
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
ecs0:/tmp/run-crons.6yRf5Z # tail run-crons.hourly.16288
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timecs0:/tmp/run-crons.6yRf5Z # rm run-crons.hourly.16288
Right or wrong, I removed the file vs truncating it, but I question whether there is a bug to address, expected behavior, or a combination thereof. After exiting the container, I issued a docker restart on it, and all the services began to function normally.
Thoughts?
Thanks!! Mike
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
April 7th, 2017 21:00
I ran in to this also.
Inside the docker container, in file /etc/cron.hourly/btree_gc, here are lines relating to the error with line numbers:
26 ECS_ROOT=/opt/storageos
27 VNEST_PROP_FILE=$ECS_ROOT/conf/vnest.object.properties
51 SCRIPT_TIMEOUT_SECS=`grep 'BTreeGCScriptTimeoutSecs' $VNEST_PROP_FILE | awk -F\= '{print $2}'`
133 timeout ${SCRIPT_TIMEOUT_SECS} find -L ${RECYCLE_BIN} -type f -not -name '*.gz' -exec gzip -q -f {} \;
Problem is that inside the file referenced by the variable, /opt/storageos/conf/vnest.object.properties, there is no entry named 'BTreeGCScriptTimeoutSecs.'
I added it as below but have no idea what a good value should be good.
echo 'object.BTreeGCScriptTimeoutSecs=5' >> /opt/storageos/conf/vnest.object.properties
I killed the following that was running for a long time, probably since the first time it ran after building the system, and restarted crond via systemctl.
root 2759 24.8 0.0 12168 2212 ? RN Apr02 1786:40 /bin/bash /etc/cron.hourly/btree_gc
After I killed all cron related but am not sure if this was necessary.
This should prevent this issue.
Again, not sure what a good value should be - I was just looking to not fill up / in container.
JasonCwik
281 Posts
0
March 21st, 2017 08:00
That's an interesting one. How long was your system up? What base OS are you installed on?
Mike_Phelps
12 Posts
0
March 21st, 2017 12:00
CentOS 7.3.1611 created on 2017 Feb 16
Docker Engine 1.13.1
emccorp/ecs-software-3.0.0:latest image id e3022d56bf25
4 vCPU, 32GB RAM, 80GB OS
4 1TB LUNs for ECS
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
April 11th, 2017 08:00
I agree that this is a bug... referencing a non-existent variable seems to be a silly oversight. I just implemented the same "fix" as you did, Chris, but I went crazy and set it to 10 seconds.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
April 11th, 2017 09:00
https://github.com/EMCECS/ECS-CommunityEdition/tree/master/patches/3.0.0.1
It turns out there is patch at above URL - the setting no longer exists in real ECS.
Thanks to Jason for the pointer.
JasonCwik
281 Posts
1
April 11th, 2017 10:00
Chris, actually the opposite. The setting does exist in real ECS (7200) and our patch inadvertently removed it.
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
April 11th, 2017 12:00
Hi Jason, are you saying the timeout in real ECS is two hours? (7200s)
travis_wichert
16 Posts
0
July 21st, 2017 12:00
Yes, that time in real ECS is two hours (7200s).
Also, this should issue should be resolved in ECS CE since we are no longer overwriting vnest.object.properties when building the CE image from the upstream ECS release artifact.