OneFS: The /var/crash Partition of a Node in the Cluster has Reached 90 Percent Capacity
Summary: The /var/crash partition of a node in the cluster has reached 90% capacity alert-Event ID: 100010002
Symptoms
Running with a full /var/crash partition is unlikely to lead to problems, but may prevent some debugging information from being saved if other issue arises. This alert could indicate that processes may be dumping core and this behavior should be investigated.
Introduction
You might receive an alert warning that the crash partition of a node in the cluster, /var/crash, has reached 90% capacity. The /var/crash partition is used to store core files from running services that have become unresponsive. This alert could indicate that processes may be dumping core and this behavior should be investigated.
Cause
/var/crash partition stores core files from running services that have become unresponsive. It might also contain log gather data or temporary files created and stored by technical support or the user.Resolution
Details
To see what is taking up space in the /var/crash partition, run the following command on any node in the cluster:
isi_for_array -s 'df -h'
isi_for_array -s 'find /var/crash -type f -size +10000 -exec ls -lh {} \;'
If you want assistance from Dell Support to investigate the issue alert, include the results of the above command in your initial email. It would also be helpful if the logs from the cluster can be reviewed.
If a core file is filling the /var/crash partition, the file should be saved temporarily to somewhere in /ifs/data/Isilon_Support and the offending service should be investigated for the cause.
A patch may exist for the specific OneFS version, it may be fixed in a later version, or a new tracking issue may be opened for the issue.
Additional Information
Related article:
- Event notification: Node reached 95% used capacity on the root file system, 000016965.