OneFS - SmartFail nodes (Time Table)

Question

Hey, would anyone know what the math is to predicting how long it should take for a node to smart-fail? Thank you,

Anonymous User · Answer

It depends on the approach and how many LINs you have on the node.

If you're refreshing multiple nodes, create a separate storage pool from them and use SmartPools to offload the data. When the is almost all gone, then SmartFail the node - this can take 5-10 minutes.

For a large node with lots and lots of little files, it can take many days to SmartFail a node full of data.

chjatwork · Answer

Ed, This would be for a node that is not responding.  Basically dead.

carlilek · Answer

Take the amount of time you need/expect it to take and multiply that by 20. That'll be your minimum.

But seriously, like Ed said, it depends on a number of things. I would also add the node type, metadata strategy, and used capacity of the other nodes in its pool.

I have seen an S class node fail out in a matter of hours. I have also seen an NL node in a pool that was >90% full take 2 weeks.

Peter_Sero · Answer

It could be even faster to have the node hw repaired if the disks content

is still intact. It is even possible to replace the whole chassis and swap

to old disks in, again with original content on them.

In both cases the cluster is protected at the moment

when the node is back online, and what is left to do

is just a less critical MultiScan (= AutoBalance + "Collect" jobs)

However, this is not possible AFTER the node has

already been smartfailed or stopfailed (the latter is what

you do with a "dead" node).

chjatwork · Answer

Peter, That is what we are doing (Swapping out the hardware).   It will most likely take less time to swap out the chassis verse waiting on the smartfail.

Isilon

OneFS - SmartFail nodes (Time Table)

Was this post helpful?