Start a Conversation

Unsolved

This post is more than 5 years old

487

March 16th, 2010 06:00

How to delete millions of small files from Celerra?

Hello All,

I'm curious if anyone has had this issue in the past - due to some software we use, we have about 10 million files located in one folder. As you can imagine simle ls can hang there for days (not hours, days). I've been trying to clean this folder up via usual "find . +mtime 5 -delete" from NFS client but it takes ages and I don't see the number of files dropping in Celerra GUI.

Is there a faster way to delete these files than from the NFS client? From Celerra Control station may be?

Thx, Serge.

8.6K Posts

March 16th, 2010 07:00

no - the control station is just a NFS client as well and typically slower than your real NFS clients

unless you want to delete the while file system there is no other way than a rm

of course deleting the directory itself with like rm -rf is going to be faster than a find | rm

deleting a file is an expensive operation - it needs to

- update the directory

- delete the inode

- unlink the data block(s)

- update the block and inode free lists

Rainer

March 16th, 2010 07:00

To second Rainer's comments, I'd suggest using an NFS client other than the control station, something that talks to the Celerra over it's public network interfaces.  I do this from an external desktop and it's much, much faster.  Instead of doing the delete in the same command, I'd run a short script with mtime to populate a file with all the filenames of the old files you wish to delete.  Then I'd run several commands in parallel to match the names and delete the files.  It should be much, much faster.   The key is to not try and do it over the backend network - it's just too painful.

Thanks!

Karl

3 Posts

March 16th, 2010 07:00

Hello Rainer,

Thanks for the suggestion. Unfortunately I cannot delete the whole directory - i need to delete files which are 5 days or older. Simply running rm -rf doesn't work - it cannot handle that many files, that's why I have to use find with either -delete (fastest), xargs rm (faster) and -execute rm {} (slow).


Thx, Serge.

8.6K Posts

March 16th, 2010 10:00

sounds like a good plan

you want to keep both the client work and the network traffic to a minimum

I'm not surprised that -exec is the slowest - forking a new rm for each delete is an expensive process

try multiple processes, good NFS client settings, see if maybe NFSv4 helps

of course it would be even better if you could teach your app to use multiple directories in the first place :-)

3 Posts

March 17th, 2010 03:00

Guys,

I wish the application would know how to use multiple directories but I'm not the one who developed it ;-).

In the end, we decided to do an mv on the directory, create another directory with the same name and make a script to clean up files older than 5 days in that newly created directory.

As for the old one, after much consideration, we decided to remove it completely so it runs under ionice -c 3 rm -rf .... Which will probably take a really long while but at least system won't be hogged.

Thanks for all the suggestions.

Thx, Serge.

No Events found!

Top