jenny_lam

57 Posts

8743

December 8th, 2014 10:00

How to clean an over-capacity OneFS 7.1.1 VM?

I have set up a OneFS 7.1.1 VM 3 months ago. The entire 17GB of OneFS is filling up even I have only 8.5MB data in it. How can I clean up the VM to free up some space?

This is a event I see from OneFS Web UI:

2014-12-07 21:11:46 There is at least one smartpool at or over capacity.

Checking on the disk usage, and also directory capacity, OneFS is 103% full, but I only have 8.5Mb data in it (/ifs/data).

isilon-vm-1# df -h

Filesystem Size Used Avail Capacity Mounted on

/dev/mirror/root1 989M 450M 460M 49% /

devfs 1.0K 1.0K 0B 100% /dev

/dev/mirror/var1 989M 70M 840M 8% /var

/dev/mirror/var-crash 1.9G 1.8M 1.8G 0% /var/crash

/dev/mirror/keystore 31M 10K 28M 0% /keystore

/dev/md0 62M 684K 56M 1% /tmp/ufp

OneFS 34G 17G -453M 103% /ifs

isilon-vm-1# du -ch /ifs/data | grep -i total

8.5M total

Note that there's only 17G in the VM, because I have only set up 3 nodes instead of 6 nodes.

Not sure if this matters: I once upgraded the VM from OneFS 7.1 to 7.1.1. I have deleted the tar.gz already, but I'm not sure if there's anything else I need to delete to free up space.

Responses(24)

Peter_Sero

1.2K Posts

0

December 12th, 2014 03:00

Just for clarity: root / includes the OS and /ifs, so you might want to du /ifs.

But it won't help to find those 17 GB...

One important aspect for "dark matter" issues:

When did the most recent MultiScan or Collect

jobs run successfully? These jobs free orphaned blocks,

which accumulate while individual nodes

are down during /ifs file system activity.

hth

-- Peter

jenny_lam

57 Posts

0

December 12th, 2014 09:00

Thanks Peter.

The MultiScan and Collect jobs are disabled and cannot be enabled, I guess because this is a VM ...

the /ifs uses 4.7G, and the /ifs/data is considerably small

isilon-vm-1# du -sh /ifs

4.7G /ifs

isilon-vm-1# du -sh /ifs/data

9.1M /ifs/data

isilon-vm-1# du -sh /ifs/.ifsvar/modules/stats

4.5G /ifs/.ifsvar/modules/stats

PL

Phil.Lam

1 Rookie

•

567 Posts

0

December 12th, 2014 10:00

Time for a new OneFS 7.2 VM?

Phil Lam

Peter_Sero

1.2K Posts

0

December 13th, 2014 03:00

The jobs usually do run on virtual nodes!

Can you add three more nodes and try again?

(Did you double check for snapshots?

SyncIQ and NDMP create snapshots, too.)

-- Peter

peglarr

99 Posts

0

December 13th, 2014 05:00

The jobs don't "usually" run on virtual nodes, they _do_ run.

This cluster is best destroyed and a new virtual cluster created. You'll save far more time and energy starting over than trying to get this cluster operating normally. Notice your /var/crash partition is full, too, meaning you've crashed virtual nodes (often) - and just like their real counterparts, running a cluster with /var/crash at 0% available is poor practice.

Not 100% sure how you ended up in this state - but again the VM is designed for quick buildup/teardown, so I believe your best bet is to create a new cluster, since you have so little data on it anyway.

BTW I believe it makes zero sense to upgrade a virtual node. Just destroy the cluster and spin up the new version. I do this in my homelab every release.

Peter_Sero

1.2K Posts

0

December 13th, 2014 06:00

Hi Rob

"Capacity 0%" is "used", not "left" amount -- I certainly dislike the df header line

As for upgrading virtual nodes, I respectfully disagree. Training, and

testing upgrades with complex custom configs make great cases

according to my experience.

Yes, the virtual nodes are not supported, so the outcome is not guaranteed.

But in lieu of a physical test cluster, virtual nodes are better than nothing.

For rigorous testing of specific OneFS versions,

fresh installs are best suited, of course!

I also believe that successfully maneuvering out of

an overloaded /ifs situation will be highly instructive,

though of course getting there should be avoided

by all means on a production cluster.

Cheers

-- Peter

peglarr

99 Posts

1

December 13th, 2014 08:00

Thanks Peter, I did indeed invert the columns, so /var/crash is not an issue. Please forgive me.

Anyway, I totally agree that using the VM for dev/test is appropriate - that's why we designed and built/issued it in the first place.

But upgrading a virtual node is not at all like upgrading a physical cluster - since there is no IB fabric, and IB plays such an important role in inter-cluster communication. It's useful to test upgrades using VMs, but only to a point.

But the original question was why is my /ifs so full. Ed Wilts nailed it - clusters (real or virtual) accumulate data in /ifs for internal purposes as well as persisting user data. In this case, the statistics database is quite large, which tells me this virtual cluster has been running for quite some time with workload.

Since this is a virtual cluster, there is no harm in either a) scaling out (adding nodes to increase /ifs capacity) or b) tearing it down and starting again with a clean cluster.

So while it's handy to 'test' upgrades with (say) 7.1 to 7.1.1 via upgrade VM, the disk sizes are so small in the VM that this scenario where the stats D/B consumes nearly all space can easily happen, where as it is extremely rare to see this in a real cluster with real drives.

Besides, in a VM scenario, if you really want to remove the persistence information, be careful what you ask for - go ahead, remove it, then kill the isi_stats_hist_d daemon. It's just a VM, after all :-)

jenny_lam

57 Posts

0

December 15th, 2014 10:00

The cluster is only for development and testing, and I didn't make any snapshot, replication, etc. I was hoping some quick resolution to free up some space, but looks like it is not ending that way.

I will try to add 3 more nodes (there's only 3 nodes on this cluster now). But likely I will ask the same question again ~3 months later if the cluster is filling up in the same speed (hope not)...

Thanks for all the helps you have offered!

gtjones1

10 Posts

2

February 24th, 2015 05:00

I had the same issue on my VM and couldn't figure how to clear off the space. What did it for me was running the Collect job. I'm running a 7.2 VM.

isi job jobs start collect

Here's what the CLI guide says about collect.

Reclaims free space that previously could not be freed because the node or drive was unavailable. Run as part of MultiScan, or automatically by the system if MultiScan is disabled.

Greg

1
2

View All

No Events found!

Isilon

How to clean an over-capacity OneFS 7.1.1 VM?