Isilon OneFS: How to smartfail out a node pool
Summary: This article describes how to properly smartfail out a node pool that is no longer needed in a cluster.
Instructions
Here are the steps in order to properly smartfail out a node pool that is no longer needed in the cluster.
- Move most of the data using File Pool Policies
- Using the CLI or WebUI, edit the File Pool Policies in order to point data from the pool being decommissioned to another pool in the cluster. For assistance on how to best configure this, reference the Administration Guide for your OneFS version.
Once the File Pool Policies have been changed, start a SmartPools job in order to apply the changes that were made. If the File Pool Policies were configured correctly, this should move most of the data.
Note: It is normal for there to still be some space used on the Node Pool (generally under 5%, but it can be more). This is fine and will not cause any issues.
- Ensure that Global Spillover is enabled so the last bit of data on the nodes are allowed to move to other node pools.
CLI:# isi storagepool settings view
WebUI:
File System -> Storage Pools -> SmartPools Settings
If it is not enabled, ensure it is enabled.
- Start the Smartfail process.
Smartfail one node at a time with the following command:
OneFS 7.x# isi devices -a smartfail -d <node LNN>
OneFS 8.x# isi devices node smartfail --node-lnn=<node LNN>
When the Smartfail process completes (FlexProtect/Lin Job), move onto the next node.
Smartfail them one at a time until two nodes remaining.
Start the Smartfail process on both nodes together for a Node Pool quorum of at least 51% of the devices online.
Smartfailing only one node of the final 2 will break the quorum and it will be unable to complete the Smartfail process.
Putting both nodes in a Smartfail status keeps quorum, the data is striped to the other node pools.
Additional Information
Always ensure to check that the system flag also resides on node pools that will NOT be smartfailed from the cluster.
KB for System flag: PowerScale 9.x performance impact on mixed node clusters with archive nodes
To check which pool IDs have the system flag use below command:
sysctl efs.bam.disk_pool_db|grep -B2 -A10 system_group
Example line look for pool_and_group_ids:
pool_and_group_ids={ 3, 4, 5 }
Use the below command to confirm disk pool IDs to match the above output:
isi storagepool health
Example line look for the number after the disk pool name, below example shows: 3 meaning ID 3:
s210_6.9tb_800gb- UM--- HDD +2d:1n 2:bay4,9,11,13,1 Nodes: Nodes:
ssd_32gb:3 8,23, 1:bay4,9,1 Drives: Drives:
3,18,23