MikePhelps
1 Nickel

ECS 3 CE single node services unavailable / root file system full

Jump to solution

I have seen mention of out of space issues as well as unresponsive web UI and services.  I exec'd into the container to find that root was full.

ecs0:/tmp # ls -lsa

total 52

0 drwxrwxrwt  5 root      root        192 Mar 20 19:45 .

4 drwxr-xr-x 28 root      root       4096 Feb 16 19:14 ..

0 -rw-r--r--  1 storageos storageos     0 Mar 20 19:17 FPVMhealthcheck.heartbeat

0 -rw-r--r--  1 root      root          0 Mar 20 19:17 certtool.lock

0 drwxr-xr-x  2 root      root         19 Mar 20 19:52 hsperfdata_root

0 drwxr-xr-x  2 storageos storageos    32 Mar 20 19:52 hsperfdata_storageos

0 drwx------  2 root      root         94 Feb 16 19:30 run-crons.6yRf5Z

48 -rwxr-xr-x  1 storageos storageos 48432 Feb 16 19:55 snappy-1.0.5-libsnappyjava.so

0 -rw-r--r--  1 root      root          0 Mar 20 19:17 systool.lock

ecs0:/tmp # cd run-crons.6yRf5Z/

ecs0:/tmp/run-crons.6yRf5Z # ls -la

total 8898796

drwx------ 2 root root         94 Feb 16 19:30 .

drwxrwxrwt 5 root root        192 Mar 20 19:45 ..

-rw-r--r-- 1 root root 9112358912 Mar 20 19:17 run-crons.hourly.16288

-rw-r--r-- 1 root root         32 Feb 16 19:30 run-crons_mail.16288

-rw-r--r-- 1 root root          0 Feb 16 19:30 run-crons_output.16288

ecs0:/tmp/run-crons.6yRf5Z # file run-crons.hourly.16288

run-crons.hourly.16288: ASCII text

ecs0:/tmp/run-crons.6yRf5Z # head run-crons.hourly.16288    

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

ecs0:/tmp/run-crons.6yRf5Z # tail run-crons.hourly.16288    

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timeout --help' for more information.

timeout: invalid time interval 'find'

Try 'timecs0:/tmp/run-crons.6yRf5Z # rm run-crons.hourly.16288

Right or wrong, I removed the file vs truncating it, but I question whether there is a bug to address, expected behavior, or a combination thereof.  After exiting the container, I issued a docker restart on it, and all the services began to function normally.

Thoughts?

Thanks!! Mike

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
chris_kraft1
1 Copper

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

I ran in to this also.

Inside the docker container, in file /etc/cron.hourly/btree_gc, here are lines relating to the error with line numbers:

26 ECS_ROOT=/opt/storageos

27 VNEST_PROP_FILE=$ECS_ROOT/conf/vnest.object.properties

51 SCRIPT_TIMEOUT_SECS=`grep 'BTreeGCScriptTimeoutSecs' $VNEST_PROP_FILE | awk -F\= '{print $2}'`

133 timeout ${SCRIPT_TIMEOUT_SECS} find -L ${RECYCLE_BIN} -type f -not -name '*.gz' -exec gzip -q -f {} \;

Problem is that inside the file referenced by the variable, /opt/storageos/conf/vnest.object.properties, there is no entry named 'BTreeGCScriptTimeoutSecs.'

I added it as below but have no idea what a good value should be good.

echo 'object.BTreeGCScriptTimeoutSecs=5' >> /opt/storageos/conf/vnest.object.properties

I killed the following that was running for a long time, probably since the first time it ran after building the system, and restarted crond via systemctl.

root      2759 24.8  0.0  12168  2212 ?        RN   Apr02 1786:40 /bin/bash /etc/cron.hourly/btree_gc

After I killed all cron related but am not sure if this was necessary.

This should prevent this issue. 

Again, not sure what a good value should be - I was just looking to not fill up / in container.

0 Kudos
8 Replies
JasonCwik
2 Iron

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

That's an interesting one.  How long was your system up?  What base OS are you installed on?

0 Kudos
MikePhelps
1 Nickel

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

CentOS 7.3.1611 created on 2017 Feb 16

Docker Engine 1.13.1

emccorp/ecs-software-3.0.0:latest  image id e3022d56bf25

4 vCPU, 32GB RAM, 80GB OS

4  1TB LUNs for ECS

0 Kudos
chris_kraft1
1 Copper

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

I ran in to this also.

Inside the docker container, in file /etc/cron.hourly/btree_gc, here are lines relating to the error with line numbers:

26 ECS_ROOT=/opt/storageos

27 VNEST_PROP_FILE=$ECS_ROOT/conf/vnest.object.properties

51 SCRIPT_TIMEOUT_SECS=`grep 'BTreeGCScriptTimeoutSecs' $VNEST_PROP_FILE | awk -F\= '{print $2}'`

133 timeout ${SCRIPT_TIMEOUT_SECS} find -L ${RECYCLE_BIN} -type f -not -name '*.gz' -exec gzip -q -f {} \;

Problem is that inside the file referenced by the variable, /opt/storageos/conf/vnest.object.properties, there is no entry named 'BTreeGCScriptTimeoutSecs.'

I added it as below but have no idea what a good value should be good.

echo 'object.BTreeGCScriptTimeoutSecs=5' >> /opt/storageos/conf/vnest.object.properties

I killed the following that was running for a long time, probably since the first time it ran after building the system, and restarted crond via systemctl.

root      2759 24.8  0.0  12168  2212 ?        RN   Apr02 1786:40 /bin/bash /etc/cron.hourly/btree_gc

After I killed all cron related but am not sure if this was necessary.

This should prevent this issue. 

Again, not sure what a good value should be - I was just looking to not fill up / in container.

0 Kudos
camattin
1 Copper

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

I agree that this is a bug... referencing a non-existent variable seems to be a silly oversight. I just implemented the same "fix" as you did, Chris, but I went crazy and set it to 10 seconds.

0 Kudos
chris_kraft1
1 Copper

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

https://github.com/EMCECS/ECS-CommunityEdition/tree/master/patches/3.0.0.1

It turns out there is patch at above URL - the setting no longer exists in real ECS.

Thanks to Jason for the pointer.

0 Kudos
Highlighted
JasonCwik
2 Iron

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

Chris, actually the opposite.  The setting does exist in real ECS (7200) and our patch inadvertently removed it.

camattin
1 Copper

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

Hi Jason, are you saying the timeout in real ECS is two hours? (7200s)

0 Kudos
travis_wichert
1 Nickel

Re: ECS 3 CE single node services unavailable / root file system full

Jump to solution

Yes, that time in real ECS is two hours (7200s). 

Also, this should issue should be resolved in ECS CE since we are no longer overwriting vnest.object.properties when building the CE image from the upstream ECS release artifact.

0 Kudos