Isilon: Event notification: The var partition is near capacity, Event ID: 100010001
Summary: This article discusses the way to clear the Var partition as it nears capacity.
Symptoms
Event
One of the following event notifications is issued:
The /var partition is near capacity (95% used) The /var partition is near capacity (85% used) The /var partition is near capacity (75% used)
Details
When the /var partition reaches 75%, 85%, or 95% of capacity, an event is logged and an alert is sent.
Cause
The /var folder contains numerous logs, diagnostic files, configuration data, and temp files for various functions of the cluster. Over time, various extra files may accumulate within the /var folder and cause it to fill up.
The /var/log/wtmp file and the rollover files /var/log/wtmp.0, /var/log/wtmp.1, for example, increase in size to over 10 MB. Sometimes, they increase to 150 MB. The /var/log/wtmp file is a binary log file that records login and logoff data. The log manager file, /etc/newsyslog.conf, does not archive the same way it does other log files so the /var/log/wtmp can grow and fill the /var directory.
Resolution
Below is the default content of a /var partition and a brief description of the more relevant sub-directories. Unless otherwise stated, the content and data within /var and its sub-directories should not be altered or removed.
ps9500x3-2# cd /var ps9500x3-2# ls .snap at backups db ifs lib patch spool account audit cache empty journal log preserve tmp agentx authpf crash games journal-peer mail run unbound apache2 backup cron heimdal krb5kdc msgs rwho yp .snap Snapshots. Do not touch. account Account information. Do not touch. agentx Empty but preserved for Agent Extensibility (AgentX) Protocol apache2 Apache Files. Do not touch. at Variable data. Do not touch. audit Audit Files. Do not touch. authpf Authentication gateway. Do not touch. backup System configuration backup files. Do not touch backups Group configuration backups. Do not touch cache System cache. Do not touch. crash Crash files, older files can be deleted if needed cron Cron jobs, do not touch db Database files. Do not touch empty Do not touch. games Empty but preserved. heimdal Kerberos 5 protocol. Do not touch. ifs Do not touch unless directed by support journal System Journal database journal-peer System Journal-peer database krb5kdc Kerberos KDC (Key Distribution Center) lib Likewise database files. Do not touch log Various System log files, can be cleared but zero's out the system logs. mail Mail sub-system files. msgs Message logs patch System patch database. Do not touch preserve Do not touch run Do not touch rwho Do Not Touch spool System Spool files. Do not touch. tmp Healthcheck items and vi recover. Do not touch. unbound Do Not Touch yp Do Not Touch
The two directories to focus on are /var/crash and /var/logs as these can grow and consume most of the disk space in the /var partition.
Older crash files in /var/crash can be removed if they are no longer needed.
The /var/logs can be zeroed out and reset if logs become too large. Keep in mind that once logs are reset, it is no longer possible to troubleshoot and research past issues.
Review df output for the /var partition. Depending on the output, perform one or more of the following tasks:
ps9500x3-2# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/mirror/root0 1957292 871082 929628 48% / devfs 1 1 0 100% /dev /dev/mirror/var0 978604 51394 848922 6% /var /dev/mirror/var-crash 2946284 10 2710572 0% /var/crash /dev/mirror/keystore 61228 46 56284 0% /keystore /dev/md0 61166 2158 54116 4% /tmp/ufp /dev/md1.uzip 435751 406426 -5535 101% /base OneFS 246327840 2362592 173903776 1% /ifs ps9500x3-2#
Rotate logs:
Detailed instructions on how to rotate logs is in KB Article 20315, Isilon: OneFS-How to rotate system logs for a node.Command to rotate the logs:
newsyslog -f
If the /var partition returns to a normal usage level, review the list of recently written logs to determine if a specific log is rotating frequently. Rotation can resolve the full-partition issue by compressing or removing large logs and old logs, thereby automatically reducing partition usage.
Check the percentage of free inodes:
Open an SSH connection to the node that reported the error and log in using the "root" account.Run the following command:
df -i |grep var |grep -v crashOutput similar to the following appears:
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/mirror/var0 1013068 49160 882864 5% 1650 139276 100% /varIf the %iused value is 90% or higher, reduce the number of files in the /var partition using one of the methods described below.
Identify files that do not belong in the /var partition:
- On the node that generated the alert, run the following command to list files in the /var partition that are greater than 5 MB:
find -x /var -type f -size +10000 -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
- In the output, look for files that do not typically belong in the /var partition. For example, a OneFS installer file, log gathers, or a user created file.
- Remove the files or move them to the /ifs directory.
Manually remove files from the /var Partition:
Once the extra files are identified, the commands needed to cleanup the /var directory usually involve Make Directory (mkdir), Copy (cp), Move (mv) and Remove (rm). Users should be familiar with these basic UNIX/Linux commands before proceeding.
Always make a backup copy of files prior to deleting or moving them from their original location.
Create a directory to move or copy backup data to, where <dest> is the destination directory. This directory is where backup copies of all files that are to be deleted should be copied to first.
# mkdir /ifs/data/Isilon_Support/<dest>
Either Copy, Move, or Delete files as appropriate:
To copy a file or directory:
# cp <file> /ifs/data/Isilon_Support/<dest>
Recursively copy a directory.
# cp -R <directory> /ifs/data/Isilon_Support/<dest>
To move a file or directory:
# mv <file> /ifs/data/Isilon_Support/<dest> # mv <directory> /ifs/data/Isilon_Support/<dest>
To remove/delete a file:
# rm <file>
Determine if a process is holding a large file open.
Use the fstat command to list the open files on a node or directory, or to list the files that a process has opened. A list of the open files can help you monitor the processes that are writing large files. See how to use the fstat command to list the open files on a node, article 21402, Isilon: How to use the fstat command to list the open files on a node.
If neither of the above tasks resolves the issue, go to the following solution:
Limit the rollover file size and compress the file
- Open an SSH connection on any node in the cluster and log in using the "root" account.
- Run the following commands to create a backup of the /etc/newsyslog.conf file:
cp /etc/newsyslog.conf /ifs/newsyslog.conf cp /etc/newsyslog.conf /etc/newsyslog.bak
- Open the /ifs/newsyslog.conf file in a text editor.
- Locate the following line:
/var/log/wtmp 644 3 * @01T05 B
- Change the line to:
/var/log/wtmp 644 3 10000 @01T05 ZBThese changes instruct the system to roll over the /var/log/wtmp file when it reaches 10 MB and to compress the file with gzip.
- Save and close the /ifs/newsyslog.conf file.
- Run the following command to copy the updated file to all nodes on the cluster:
isi_for_array 'cp /ifs/newsyslog.conf /etc/newsyslog.conf'
- Log files rotate automatically if necessary using a cron job on the hour and half hour (/etc/crontab)
#minute hour mday month wday who command # # rotate log files every half-hour, if necessary 0,30 * * * * root newsyslogIf other logs are rotating frequently, or if the preceding steps do not resolve the issue, contact Dell Technical Support for assistance.