OneFS: Event notification: Node reached 95 percent or greater used capacity on the root file system
Summary: On older Isilon IQ X and NL-series nodes, the root (/) directory has a maximum size of 500 MB and shows 95% or greater used capacity. This occurs even when there are not any extra user files or firmware packages installed. Newer nodes have root directories with a size of 1 GB and typically show 49% used capacity. ...
Symptoms
Event
Node reached 95% used capacity on the root file system.
-Or-
The root partition is near capacity.
Event ID: 100010003
Cause
Details
This event indicates that the root partition on one or more nodes is approaching full capacity.
This event might occur for several reasons. The two most common reasons are:
- A user has moved a file to an unspecified root directory instead of to the /ifs directory.
- Node firmware was upgraded, but the firmware package was not removed.
Resolution
Response
Troubleshoot the cause of this alert using one or more of the following procedures.
IMPORTANT!
Do not remove or install any software patches while the root partition is full or near capacity. Attempting to install/remove a patch while the root partition is full could cause the patch installation or removal process to fail. If the installation or removal process fails on a cluster with a full root partition, this may prevent rolling back to the previous system configuration. This situation could leave the cluster in an unstable or inaccessible state.
For more information about maintaining sufficient free space on an Isilon cluster, see the Cluster Capacity Management Guide on the Dell Online Support site.
Phase 1: Remove files that do not belong in the root (/) directory.
- Examine the alert message to determine the affected file system. The message identifies the cluster and the nodes (by node number) that generated the alert. The message also identifies the affected file system as one of the following: ifs, var, crash, or root.
- On the node that generated the alert, run the following command to list all files in the root directory that are greater than 5 MB:
find -x / -type f -size +10000 -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
Output similar to the following appears:
/boot/kernel.amd64/efs.ko: 10M /usr/libexec/cc1: 6.6M /usr/libexec/cc1plus: 7.2M /usr/libexec/cc1obj: 6.6M /usr/local/lib/libxerces-c-3.1.so: 5.6M /usr/local/lib/libxerces-c.so: 5.6M /usr/local/sbin/nmbd: 9.9M /usr/local/connectemc/connectemc: 15M /usr/local/aspera/sbin/asperacentral: 5.0M
The example output above lists files which are typically found in the root directory. These files should not be removed.
- In the output, look for any files that do not typically belong in the root directory. For example, a OneFS installer file, log gathers, or a user-created file. (See the example output in the previous step for files which belong in the root directory and should not be removed.)
- Remove the files or move them to the /ifs directory. If you are unsure about what files to remove, contact Isilon Technical Support for assistance.
Run the following command to confirm that the root (/) directory is below the alert threshold:
isi_for_array -s 'df -h /'
Output similar to the following appears:
Filesystem Size Used Avail Capacity Mounted on /dev/imdd0a 496M 445M 11M 94% /
- Clear all existing alerts:
isi event bulk --resolved=true (OneFS 8.0 or newer) -or- isi events cancel all (OneFS 7.x or older)
- If the space is not reclaimed after removing any large files, look for a process that still has the file open. That process must be stopped to close the file handle holding the space. See How to use the fstat command to list the open files on a node, article 322712.
- If the issue is not resolved, go to Phase 2.
Phase 2: Remove Isilon node firmware packages.
Instructions included with Isilon node firmware packages include a step for removing the firmware package after the firmware is installed. If that step is not completed, it may cause the root directory to exceed capacity.
- Run the following command to confirm that a firmware package is installed on the cluster:
isi upgrade patches list (OneFS 8.0 or newer) -or- isi pkg info (OneFS 7.x or older)
Output similar to the following appears:
IsiFw_Package_v8.2: Isilon firmware packages contain firmware images that may be used to update certain devices in your cluster. To install this firmware package, use the 'isi pkg install <package-filename>' command. Note that the act of installing the package will not automatically update your devices. Once installed, please refer to 'isi firmware --help' or the firmware section in 'man isi' for more information.
- Remove the firmware upgrade package, where <package-filename> or <patch> is the name of the firmware-package filename:
isi upgrade patches uninstall --patch=<patch> (OneFS 8.0 or newer) -or- isi pkg delete <package-filename> (OneFS 7.x or older)
- Run the following command to confirm that the root directory is below the alert threshold:
isi_for_array -s 'df -h /'
Output similar to the following appears:
mycluster-1: Filesystem Size Used Avail Capacity Mounted on mycluster-1: /dev/imdd0a 496M 445M 11M 97% / mycluster-2: Filesystem Size Used Avail Capacity Mounted on mycluster-2: /dev/imdd0a 496M 445M 12M 97% / mycluster-3: Filesystem Size Used Avail Capacity Mounted on mycluster-3: /dev/imdd0a 496M 445M 12M 97% /
- Clear all existing alerts:
isi event bulk --resolved=true (OneFS 8.0 or newer) -or- isi events cancel all (OneFS 7.x or older)
- If the issue is not resolved, go to Phase 3.
Phase 3: Contact Isilon Technical Support
If unable to determine why the root (/) directory is near or above capacity, do the following:
- Gather system logs by running the following command:
isi_gather_info -s 'ls -lhat /' -s 'du -axh / | sort -rn' -s 'du -xhd1 /'
The logs are automatically uploaded to Dell Technical Support.
- Contact Dell Technical Support to assist with troubleshooting the issue.