Openshift Event Code: 1038NODE0007
Summary: Filesystem is predicted to run out of space within the next 4 hours.
Symptoms
As a file system starts to get low on space, system performance usually degrades gradually.
If a file system fills up and runs out of space, processes that need to write to the file system can no longer do so, which can result in lost data and system instability.
Cause
The NodeFilesystemSpaceFillingUp alert triggers when two conditions are met:
- The current file system usage exceeds a certain threshold.
- An extrapolation algorithm predicts that the file system will run out of space within a certain amount of time. If the time period is less than 24 hours, this is a
Warningalert. If the time is less than 4 hours, this is aCriticalalert.
Resolution
Diagnosis
-
Study recent trends of file system usage on a dashboard. Sometimes, a periodic pattern of writing and cleaning up in the file system can cause the linear prediction algorithm to trigger a false alert.
-
Use the Linux operating system tools and utilities to investigate what directories are using the most space in the file system. Is the issue an irregular condition, such as a process failing to clean up behind itself and using a large amount of space? Or does the issue seem to be related to organic growth?
To assist in your diagnosis, watch the following metric in PromQL(execute the query on the OCP web console: Observe → Metrics → Run queries):
node_filesystem_free_bytes |
Then, check the mountpoint label for the alert.
Mitigation
If the mountpoint label is /, /sysroot or /var, remove unused images to resolve the issue:
-
Debug the node by accessing the node file system:
$ NODE_NAME=<instance label from alert> $ oc -n default debug node/$NODE_NAME $ chroot /host
-
Remove dangling images:
$ podman images -q -f dangling=true | xargs --no-run-if-empty podman rmi
-
Remove unused images:
$ podman images | grep -v -e registry.redhat.io -e "quay.io/openshift" -e registry.access.redhat.com -e docker-registry.usersys.redhat.com -e docker-registry.ops.rhcloud.com -e rhmap | xargs --no-run-if-empty podman rmi 2>/dev/null
-
Exit debug:
$ exit $ exit
Support
If all the above steps cannot resolve the issue, contact the Dell EMC technical support for further investigation.