PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool
Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing disk space on a node pool have been exhausted. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
Note: The following procedure requires the removal of existing file pool policies and striping of data across all nodes regardless of the workflow that data belongs to. Without the File Pool Policies, no management of data between the pools or tiers occurs. Be sure that the impact of this procedure is fully understood as it may lead to performance degradation. Only perform this is as a last-ditch effort after all other options for resolving capacity issues have been attempted.
It is widely believed that
Testing on OneFS 8.0 and above prove this is not entirely true. If the cluster only has the default File Pool policy of
This should only be used as an emergency workaround for clusters that have one full node pool. This process moves data quickly off the full node pool.
Question: When would one want to use the following procedure?
Answer: This procedure would be used when the following conditions exist:
It is widely believed that
AutoBalance and AutoBalanceLin only balance data within node pools and not across node pools. Also, it is believed only smartpools/smartpoolstree can move data between two node pools.
Testing on OneFS 8.0 and above prove this is not entirely true. If the cluster only has the default File Pool policy of
anywhere:anywhere, AutoBalanceLin and AutoBalance moves data across multiple node pools.
This should only be used as an emergency workaround for clusters that have one full node pool. This process moves data quickly off the full node pool.
Question: When would one want to use the following procedure?
Answer: This procedure would be used when the following conditions exist:
- The cluster contains multiple node pools, and one or more of the node pools is 100% full.
- There is an immediate requirement to free up disk space on a full node pool.
- The exact organization of the data is not an immediate concern.
Steps:
- Make note of, and then delete all existing file pool policies except the default '
any:any' file pool policy which are configured on the cluster.
Before proceeding with this step:
- Record the current file-pool policy configuration before removing the policies. If time permits, a full log gather is recommended.
- By default, Isilon clusters are configured with the Default-File Pool Policy set to write data to
'anywhere:anywhere'. Verify that the Default-File Pool Policy is reverted to these default settings before proceeding.
Example: Default File Pool Policy. Observe that the Storage Targets are set to '
anywhere'.
# isi filepool default-policy view
Set Requested Protection: default
Data Access Pattern: concurrency
Enable Coalescer: Yes
Enable Packing: No
Data Storage Target: anywhere
Data SSD Strategy: metadata
Snapshot Storage Target: anywhere
Snapshot SSD Strategy: metadata
Cloud Pool: -
Cloud Compression Enabled: -
Cloud Encryption Enabled: -
Cloud Data Retention: -
Cloud Incremental Backup Retention: -
Cloud Full Backup Retention: -
Cloud Accessibility: -
Cloud Read Ahead: -
Cloud Cache Expiration: -
Cloud Writeback Frequency: -
Cloud Archive Snapshot Files: -
ID: -
- Run a
SmartPoolsjob to apply new directory markings:
# isi job start smartpools -p 1 --policy medium
Note: Expect the
Smartpools job to complete faster than usual with only the anywhere:anywhere Default File Pool Policy in place.
Note: Due to a new design in later versions of OneFS the following error may occur when you attempt to run the
SmartPools job due to the node pool being too full:
# isi job jobs start SmartPools Job operation failed: Job 'SmartPools' cannot start because the cluster's free disk space percentage is below 2 (isi_gconfig -t job-config core.free_blocks_pct_threshold_lo threshold) and this job does not free disk space. Free up some space (e.g. run TreeDelete, SnapshotDelete) then try again.: No space left on device
If you DO NOT SEE the error message, above, go to Step 3 below.
If you DO SEE this error message, proceed with step 2a below.
- Check again to see if there is any data you can delete to free up some space on the full node pool. This would include checking for any large snapshots and also checking for any large system or audit files with the following commands:
Isilon-28# du -sh /ifs/.ifsvar/audit/logs Islon-28# du -sh /ifs/.ifsvar
If you can delete enough data, try running the SmartPools job again.
If there is absolutely no data which can be deleted, the recommended mitigation step would be as follows:
- Modify the Default-File Pool Policy, above, to write to the less full node pool.
- Identify a data path on the full node pool which includes most of the data.
- In a screen session, run:
# isi filepool apply -r <data path> to manually move data under a certain path i.e. # isi filepool apply -r /ifs/data/win_data/test_data verify the job is running: # ps auwx | grep apply root 45237 98.1 0.0 102268 61176 0 R+ 13:34 0:35.04 /usr/libexec/isilon/isi /usr/bin/isi filepool apply -r /ifs/
- Monitor the capacity. Once the full node pool is under 96%, then start all over with step two, above.
- Run an
AutoBalanceLinjob for a few hours, and monitor space. (UnlikeAutoBalance, which does a full tree walk before moving any data,AutoBalanceLinrestripes data immediately)
# isi job start autobalancelin -p 1 --policy medium
Almost immediately, observe that the data shifts around between the node pools and the full node pool should free up in disk space.
Note: For this step,
AutoBalanceLin does not have to run to completion. Monitor the AutoBalanceLin job until the goal of cleaning up the full node pool is achieved and then cancel the job. For example, you can cancel the job once the full node pool is down to 85% of capacity.
Note: Monitor cluster utilization and confirm that other jobs are canceled or pause if space is being adversely affected.
Continue to monitor space every hour or so until space is at a sufficient level using the following command:
Continue to monitor space every hour or so until space is at a sufficient level using the following command:
# isi stat -p -v
Note: Again, It is not advisable to let
Once the cluster has achieved relief in space on the full node pool, cancel the
AutoBalanceLin run to completion. At a certain point, the job may shift data in a new direction, and it may start to produce undesirable results. For example, it reverses the data movement, possibly leading to the other node pool nearing full capacity as the previous pool empties. Only run AutobalanceLIN for a few hours, or until the space is cleaned out, then cancel once the goal is achieved.
Once the cluster has achieved relief in space on the full node pool, cancel the
AutoBalanceLin job:
# isi job cancel autobalancelin
Note: for optimal results, protection levels between the node pools should be of equal value.
Additional Information
Lab Testing Results:
Two node pools, equal protection levels
This is before:
Node Pool Name: x410_archive Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 85.2T (94.6T Raw) 2.2T (2.2T Raw) VHS Size: 9.4T Used: 29.9T (35%) 35.2G (2%) Avail: 55.3T (65%) 2.1T (98%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 5|10.5.80.190 | OK |881.6| 0|881.6|10.0T/31.5T( 32%)|11.7G/ 738G( 2%) 6|10.5.80.191 |-A-- | 0| 0| 0|10.0T/31.5T( 32%)|11.7G/ 738G( 2%) 7|10.5.80.192 | OK | 0| 0| 0|10.0T/31.5T( 32%)|11.7G/ 738G( 2%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_archive | OK |110.2| 0|110.2|29.9T/85.2T( 35%)|35.2G/ 2.2T( 2%) Node Pool Name: x410_35tb_800gb-ssd_64gbProtection: +2d:1n Pool Storage: HDD SSD Storage Size: 112.8T (125.3T Raw) 2.9T (2.9T Raw) VHS Size: 12.5T Used: 5.6T (5%) 7.9G (< 1%) Avail: 107.2T (95%) 2.9T (> 99%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 1|10.5.80.186 | OK | 2.9M|82.8M|85.6M| 1.4T/31.5T( 4%)| 2.0G/ 738G(< 1%) 2|10.5.80.187 |-A-- | 104k|38.8k| 143k| 1.4T/30.6T( 5%)| 1.9G/ 738G(< 1%) 3|10.5.80.188 | OK |881.6| 0|881.6| 1.4T/31.5T( 4%)| 2.0G/ 738G(< 1%) 4|10.5.80.189 | OK | 0|25.8k|25.8k| 1.4T/31.5T( 4%)| 2.0G/ 738G(< 1%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_35tb_800gb-ssd|-M---| 371k|10.4M|10.7M| 5.6T/112.8T( 5%)| 7.9G/ 2.9T(< 1%) _64gb | | | | | | X410-2# date Thu Jun 14 16:53:29 CDT 2018 one filepool policy set to default any:any X410-2# isi job start autobalancelin -p 1 --policy medium Started job [7159] in as little as 30 minutes you will see data shift between the two pools, i.e. our first node pool, below, dropped from 32% full to 29% X410-2# date Thu Jun 14 17:24:20 CDT 2018 Node Pool Name: x410_archive Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 85.2T (94.6T Raw) 2.2T (2.2T Raw) VHS Size: 9.4T Used: 27.7T (33%) 34.3G (2%) Avail: 57.5T (67%) 2.1T (98%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 5|10.5.80.190 | OK | 1.1k|25.8k|26.9k| 9.2T/31.5T( 29%)|11.4G/ 738G( 2%) 6|10.5.80.191 |-A-- | 1.1k| 1.2M| 1.2M| 9.2T/31.5T( 29%)|11.4G/ 738G( 2%) 7|10.5.80.192 | OK |28.6k| 5.2k|33.7k| 9.2T/31.5T( 29%)|11.4G/ 738G( 2%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_archive | OK | 3.8k| 152k| 156k|27.7T/85.2T( 33%)|34.3G/ 2.2T( 2%) Node Pool Name: x410_35tb_800gb-ssd_64gbProtection: +2d:1n Pool Storage: HDD SSD Storage Size: 112.8T (125.3T Raw) 2.9T (2.9T Raw) VHS Size: 12.5T Used: 7.6T (7%) 8.8G (< 1%) Avail: 105.2T (93%) 2.9T (> 99%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 1|10.5.80.186 | OK |37.9k| 279k| 316k| 1.9T/31.5T( 6%)| 2.2G/ 738G(< 1%) 2|10.5.80.187 |-A-- | 1.4M|34.8M|36.2M| 1.9T/30.6T( 6%)| 2.2G/ 738G(< 1%) 3|10.5.80.188 | OK | 130k|30.9k| 161k| 1.9T/31.5T( 6%)| 2.2G/ 738G(< 1%) 4|10.5.80.189 | OK | 0| 0| 0| 1.9T/31.5T( 6%)| 2.2G/ 738G(< 1%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_35tb_800gb-ssd|-M---| 198k| 4.4M| 4.6M| 7.6T/112.8T( 7%)| 8.8G/ 2.9T(< 1%) _64gb | | | | | |
Continue to monitor every hour or so until the diskspace is at a sufficient level.
Note: Do not let
AutoBalanceLin run to completion. The job shifts data in the opposite direction which can start to produce undesirable results. The disk space consumed can reverse, possibly leading to a near-full node pool again. Only run the job for a few hours.
One hour point:
X410-2# date Thu Jun 14 17:54:30 CDT 2018 Node Pool Name: x410_archive Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 85.2T (94.6T Raw) 2.2T (2.2T Raw) VHS Size: 9.4T Used: 25.2T (30%) 33.9G (2%) Avail: 60.0T (70%) 2.1T (98%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 5|10.5.80.190 | OK |881.6|20.6k|21.5k| 8.4T/31.5T( 27%)|11.3G/ 738G( 2%) 6|10.5.80.191 |-A-- | 0| 0| 0| 8.4T/31.5T( 27%)|11.3G/ 738G( 2%) 7|10.5.80.192 | OK | 2.2k| 216k| 218k| 8.4T/31.5T( 27%)|11.3G/ 738G( 2%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_archive | OK |385.7|29.5k|29.9k|25.2T/85.2T( 30%)|33.9G/ 2.2T( 2%) X410-2# date Thu Jun 14 18:54:43 CDT 2018 Node Pool Name: x410_archive Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 85.2T (94.6T Raw) 2.2T (2.2T Raw) VHS Size: 9.4T Used: 21.6T (25%) 26.8G (1%) Avail: 63.6T (75%) 2.1T (99%) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 5|10.5.80.190 | OK |22.9k| 1.4M| 1.5M| 7.2T/31.5T( 23%)| 8.9G/ 738G( 1%) 6|10.5.80.191 |-A-- |881.6| 231k| 232k| 7.2T/31.5T( 23%)| 8.9G/ 738G( 1%) 7|10.5.80.192 | OK | 0| 0| 0| 7.2T/31.5T( 23%)| 8.9G/ 738G( 1%) -------------------+-----+-----+-----+-----+-----------------+----------------- x410_archive | OK | 3.0k| 210k| 213k|21.6T/85.2T( 25%)|26.8G/ 2.2T( 1%)
Space is sufficiently cleaned up. The AutoBalanceLIn job can be canceled since the wanted results have been achieved.
Affected Products
Isilon X400Article Properties
Article Number: 000009283
Article Type: How To
Last Modified: 23 Jun 2026
Version: 9
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.