Unsolved

This post is more than 5 years old

2 Posts

1133

November 13th, 2016 23:00

OneFS capacity after smartfail

The OneFS 7.1 user guide stated " After all data migration is complete, OneFS logically removes the device from the cluster, the cluster logically changes its width to the new configuration"

Does it mean the usable capacity of OneFS will be decreased accordingly after a disk was smart-failed and removed from the node?

If no new disk is added back to the node afterwards, are the existing data in the cluster protected?

Provided that we keep the OneFS at most 70% full, can we keep the cluster running  protected with a few disks removed via smart-fail?

Your advice and comments will be very much appreciated

Thanks

6 Operator

 • 

1.2K Posts

November 14th, 2016 04:00

> Does it mean the usable capacity of OneFS will be decreased accordingly after a disk was smart-failed and removed from the node?

yes, see isi status

or df /ifs

> If no new disk is added back to the node afterwards, are the existing data in the cluster protected?

yes, and it's even better: you never leave full protection during a smartfail.

Smartfail means a graceful decommissioning of a drive

and occurs on two occasions: a) deliberate or b) automatic pro-active removal.

In both cases, the content of the drive is still accessible and considered 'good'.

As opposed to an outage of a faulty drive, or a deliberate 'stopfail',

where the cluster is underprotected until the FlexProtect job finishes.

> Provided that we keep the OneFS at most 70% full, can we keep the cluster running  protected with a few disks removed via smart-fail?

Absolutely, you do the exact math, but 70% would usually leave

headroom for 'a few drives' indeed. In many cases even for

smartfailing a full node.

You have just uncovered what largely makes the magic of OneFS.

And it all works identically with a virtual cluster aka simulator:

seeing is believing.

Cheers

-- Peter

2 Posts

November 14th, 2016 20:00

Thank you for your expert advice.

I have some follow-up questions.

If we choose to let any failed disks removed from our 6-node OneFS following FlexProtect /Smart-fail and do not add any new disks afterwards. It may happen that the number of disks in a node may differ across nodes.Will such imbalance result in reduced stability such as lower protection level and performance?

Thanks.

6 Operator

 • 

1.2K Posts

November 15th, 2016 04:00

Protection always comes first! Unless FlexProtect fails, files are protected.

When new files can't be protected, writes are denied.


At the same time, OneFS tries to keep all data balanced, and

this lead to 'interesting' effects if node capacities are unequal due to removed disks.

Drives in affected nodes become full early, and using up the last free

blocks leads to fragmentation which usually impacts performance.

Another effect, thinking of an extreme case, is wasting capacity when a single

node has more remaining disks than the other nodes. That single node's

excess capacity cannot be fully used up, because no other nodes

provide matching space for striping data according to the requested protection level.

Data writes would be denied, despite OneFS still reporting *less* than 100%

raw capacity used. Makes sense (without a picture)?

-- Peter

No Events found!

Top