sa_taku

26 Posts

8562

February 5th, 2017 21:00

How can I reduce the size of isi_gather_info?

Hi,

Our customer has a Isilon clustoer running in OneFS 8.0.0.3.

And the size of isi_gather_info has been too big, almost 3GB.

I'd like to know how to reduce the size of isi_gather_info.

I found that the "minidump.tar" file in a node was 2.5GB.

And I found a commnad option for isi_gather_info when Iread CLI administration document for OneFS 8.0.0.

--clean-cores

--clean-dumps

--clean-all

Which command delete the minidump file, and reduce the size of isi_gather_info???

Responses(15)

philippspohr

107 Posts

2

February 6th, 2017 00:00

Hi sa.taku,

if you don't have issues with your cluster and the DellEMC Support does not need a total log set with all dumps you can delete the dump files and core files with --clean-all or of course manually out of the /var/crash directory too. Memory dumps are created e.g. if there are failures or exceptions during booting a node. If you delete the files and after the next reboot there are new dump files you should open a SR to find the reason for the issue. Maybe there is something going wrong with the cluster...

If you want to keep all dumps and core files stored on the Isilon but want to exclude them of a new log set you can use the parameter --no-dumps and --no-cores.

Phil

sjones51

252 Posts

0

February 6th, 2017 06:00

Hello sa.taku,

To add to what has already been said, if you don't need a full log set. you can also run an incremental log set. Essentially it only gathers information that has changed since the last log set was run. Support may need the full log set, including cores and dumps, so it is best to check with the technical support engineer assigned to the service request as to what is needed.

The command to run an incremental gather is:

#isi_gather_info --incremental

The incremental flag is discussed on page 105 of the OneFS 8.0.0 CLI Admin Guide
https://support.emc.com/docu65065

AU

Anonymous User

170 Posts

0

February 6th, 2017 11:00

You can also clean up the crap that Isilon doesn't and tech support doesn't want. Things like old /ifs/.ifsvar/modules/jobengine/cp/ files. There can be tens of thousands of these and the log extractor that the logs go through doesn't even extract them. So we tar them up, compress them, send them across the Internet to EMC, where they throw them away...

WB

William-Brown

3 Posts

0

May 15th, 2017 03:00

Is it known to be safe to remove the old directories in /ifs/.ifsvar/modules/jobengine/cp? A quick look shows that we have the whole sequence back to job No. 1.

On one cluster that is 14,000 directories and on another it is 25,000 files. For the latter, the isi_gather_info takes 2-3 hours to complete, and if removing these made it faster I'd be very pleased.

AU

Anonymous User

170 Posts

5

May 15th, 2017 07:00

I regularly do my own cleanup.

#!/bin/bash

isi_for_array "find /var/crash/ -name 'hwcheck*' -mtime +30 -maxdepth 1 -type d -exec rm -Rf {} \;"

find /ifs/.ifsvar/modules/jobengine/cp/ -mtime +30 -maxdepth 1 -type d -exec rm -Rf {} \;

find /ifs/data/Isilon_Support/pkg/ -mtime +30 -maxdepth 1 -type f -exec rm -Rf {} \;

This has not caused me a problem yet on any of my clusters.

KTice1

8 Posts

2

January 25th, 2018 06:00

To add to Ed's comment and answer William, I asked Support about removing old info from /ifs/.ifsvar/modules/jobengine/cp/ and was told that it's ok to do.

Our isi_gathers were running 3+ hours on a 12 node NL400 cluster. The /ifs/.ifsvar/modules/jobengine/cp/ directory was ~128000 folders and ~209000 files. Imagine manually tarring and gzipping that and you can see why it was taking so long.

After whittling this directory down from 1445 days worth of logs to 30, the isi_gather_info now only takes ~30 minutes!!

Apologies for reviving such an old post, but there isn't much out there regarding this, so I'm hoping this will help anyone that comes across this post.

Mark_Strong

41 Posts

0

February 16th, 2018 06:00

Please refer to Article Number 000459277

isi_gather_info gathers unexpectedly large logset

https://emcservice.force.com/CustomersPartners/kA2j0000000R3uoCAC

J

JohnB999

11 Posts

0

March 20th, 2020 09:00

/ifs/.ifsvar/modules/tsm/sched/reports

looks to be just as bad...

J

JohnB999

11 Posts

0

March 20th, 2020 09:00

Also replying on a very old thread - our space usage looked to be somewhere entirely different, namely:

-rw-rw---- 1 root wheel 452M Mar 20 15:50 celog_ifsvar_db.tar
-rw-rw---- 1 root wheel 1.6G Mar 20 16:35 ifsvar_modules.tar

The top one is the event log DB, which is OK, as it's only a few files. The second one is worse, as it consists of:

# du -sh /ifs/.ifsvar/modules/tsm/*
174M /ifs/.ifsvar/modules/tsm/backup
4.5G /ifs/.ifsvar/modules/tsm/bwt
48M /ifs/.ifsvar/modules/tsm/config
64K /ifs/.ifsvar/modules/tsm/conflict_logs
64K /ifs/.ifsvar/modules/tsm/failpoint_files
8.0M /ifs/.ifsvar/modules/tsm/running
27M /ifs/.ifsvar/modules/tsm/samba-bak
9.1G /ifs/.ifsvar/modules/tsm/sched
73M /ifs/.ifsvar/modules/tsm/sched_edit
45M /ifs/.ifsvar/modules/tsm/target_work_locks
3.1M /ifs/.ifsvar/modules/tsm/work_locks

/ifs/.ifsvar/modules/tsm/bwt is 4.5GiB of low-level logs going back 6 years!

J

JohnB999

11 Posts

0

March 20th, 2020 10:00

Looking at our SyncIQ policies, the following 2 settings seem to be the key:

Report Max Age: 4W2D
Report Max Count: 2000

So, the age policy isn't overriding the count policy, and we end up with 2000 SyncIQ reports per policy, e.g.:

# ls /ifs/.ifsvar/modules/tsm/sched/reports/0007430a77cc3fc78755150883d43aa1|wc -l
2010

# du -sh /ifs/.ifsvar/modules/tsm/sched/reports/0007430a77cc3fc78755150883d43aa1
190M /ifs/.ifsvar/modules/tsm/sched/reports/0007430a77cc3fc78755150883d43aa1

DELL-Sam L

Moderator

•

6.9K Posts

0

March 20th, 2020 14:00

Hello JohnB999,

There are a few options for reducing the size of log gathers. We detail them below.

Option 1: Remove common sources of extra data when gathering logs

OneFS 7.1.1.2 offers the ability to exclude node reboot cores, process cores, and hangdumps when running isi gather info. When running isi_gather_info, add one or both of these switches to the command to limit what is gathered:

--no-dumps--no-cores

Option 2: Remove common sources of extra data from the cluster permanently with isi_gather_info

Instruct isi_gather_info to remove cores and hangdumps from the cluster after they have been gathered

OneFS 7.1.1.2 offers the ability to clean node reboot cores, process cores, and hangdumps after they have been gathered with isi_gather_info. When running isi_gather_info, add this switch to the command to clean those cores and hangdumps from the cluster:

--clean-all

If you wish to clean only cores, or only hangdumps, you can add one of the following switches to the isi_gather_info command:

--clean-dumps
--clean-cores

Option 3: Manually remove sources of data

vmcore files in /var/crash from node reboot events

When a node reboots, it creates a file in /var/crash with a filename that starts with vmcore. If those vmcore files have already been uploaded to EMC support for analysis, you can remove the file(s) with this command:

isi_for_array "rm -rf /var/crash/vmcore*"

hangdump files in /var/crash from cluster hangdump events

During file locking activity, the isi_hangdump daemon will sometimes trigger a hangdump on all nodes in the cluster which creates a diagnostic file in /var/crash with a filename that starts with isi_hangdump. If those hangdump files have already been uploaded to EMC support for analysis, you can remove the file(s) with this command:

isi_for_array "rm -rf /var/crash/isi_hangdump*"

Process core files in /var/crash

When a process terminates abnormally, it usually writes out the contents of its memory into a core file and stores that file in /var/crash with a filename that ends with .core.gz. If those core files have already been uploaded to EMC support for analysis, you can remove the file(s) with this command:

isi_for_array "rm -rf /var/crash/*.core.gz"

rst* logfiles in /ifs/.ifsvar/tmp from previous isi_vol_copy or isi_vol_copy_vnx migrations

During isi_vopy_copy or isi_vol_copy_vnx migrations, those processes create files in /ifs/.ifsvar/tmp that begin with the filename rst. If you do not need those files any longer, you can remove them from that directory with this command:

rm -rf /ifs/.ifsvar/tmp/rst*

An issue in 7.1.0.0 which gathers FSAnalyze reports databases (fixed in 7.1.0.1)

OneFS 7.1.0.0 had a problem with its isi_gather_info script which would cause the FSAnalyze reports databases to be collected, driving up the size of the log gather. This issue was resolved in OneFS 7.1.0.1.

SyncIQ report files

SyncIQ report files can take up larger than expected amounts of space and possibly increase the amount of time it takes isi_gather_info to complete its log gather. You can adjust the amount of SyncIQ report retention to keep only a certain number of reports, or limit how long it will keep reports via these command line options. Choose which command suits your needs best and apply it to existing policies.

isi sync policies modify --report-max-age isi sync policies modify --report-max-count

Once you have made this change to your policies you can change the defaults for new policies with the following commands:

isi sync settings modify --report-max-age

isi sync settings modify --report-max-count

Please let us know if you have any other questions.

J

jessieobioma

49 Posts

0

June 8th, 2022 06:00

Hello Sam L,

How much time would it take to collect gatherinfo log in a cluster of 3 or more NL410 nodes?
I am asking because I would not want to interrupt some processes that are running in the cluster and also it would help me plan for the appropriate time to run the gather info command.

thank you.

DELL-Sam L

Moderator

•

6.9K Posts

0

June 8th, 2022 09:00

Hello jessieobioma,

It can take 10-20 minutes per node to gather the logs. If the cluster is really busy then it normally takes about 20 minutes to complete.

J

jessieobioma

49 Posts

0

June 9th, 2022 00:00

Thank you for responding.
One other quick question please.
What is the industry standard procedure for running both hardware and software health check in Isilon nodes.
If there’s a document, please point me to it.

Thank you.

DELL-Sam L

Moderator

•

6.9K Posts

0

June 9th, 2022 08:00

Hello jessieobioma,

Here is a link to Dell EMC Isilon HealthCheck - Isilon Info Hub which has some additional information about healthcheck.

https://dell.to/3MDjvKb

View All

No Events found!