PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing disk space on a node pool have been exhausted. ...

Affected Products

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

Note: The following procedure requires the removal of existing file pool policies and striping of data across all nodes regardless of the workflow that data belongs to. Without the File Pool Policies, no management of data between the pools or tiers occurs. Be sure that the impact of this procedure is fully understood as it may lead to performance degradation. Only perform this is as a last-ditch effort after all other options for resolving capacity issues have been attempted.

It is widely believed that AutoBalance and AutoBalanceLin only balance data within node pools and not across node pools. Also, it is believed only smartpools/smartpoolstree can move data between two node pools.

Testing on OneFS 8.0 and above prove this is not entirely true. If the cluster only has the default File Pool policy of anywhere:anywhere, AutoBalanceLin and AutoBalance moves data across multiple node pools.

This should only be used as an emergency workaround for clusters that have one full node pool. This process moves data quickly off the full node pool.

Question: When would one want to use the following procedure?
Answer: This procedure would be used when the following conditions exist:

The cluster contains multiple node pools, and one or more of the node pools is 100% full.
There is an immediate requirement to free up disk space on a full node pool.
The exact organization of the data is not an immediate concern.

Steps:

Make note of, and then delete all existing file pool policies except the default 'any:any' file pool policy which are configured on the cluster.

Before proceeding with this step:

Record the current file-pool policy configuration before removing the policies. If time permits, a full log gather is recommended.
By default, Isilon clusters are configured with the Default-File Pool Policy set to write data to 'anywhere:anywhere'. Verify that the Default-File Pool Policy is reverted to these default settings before proceeding.

Example: Default File Pool Policy. Observe that the Storage Targets are set to 'anywhere'.

# isi filepool default-policy view
          Set Requested Protection: default
               Data Access Pattern: concurrency
                  Enable Coalescer: Yes
                    Enable Packing: No
               Data Storage Target: anywhere
                 Data SSD Strategy: metadata
           Snapshot Storage Target: anywhere
             Snapshot SSD Strategy: metadata
                        Cloud Pool: -
         Cloud Compression Enabled: -
          Cloud Encryption Enabled: -
              Cloud Data Retention: -
Cloud Incremental Backup Retention: -
       Cloud Full Backup Retention: -
               Cloud Accessibility: -
                  Cloud Read Ahead: -
            Cloud Cache Expiration: -
         Cloud Writeback Frequency: -
      Cloud Archive Snapshot Files: -
                                ID: -

Run a SmartPools job to apply new directory markings:

# isi job start smartpools -p 1 --policy medium

Note: Expect the Smartpools job to complete faster than usual with only the anywhere:anywhere Default File Pool Policy in place.

Note: Due to a new design in later versions of OneFS the following error may occur when you attempt to run the SmartPools job due to the node pool being too full:

# isi job jobs start SmartPools

Job operation failed: Job 'SmartPools' cannot start because the cluster's free disk space percentage is below 2 (isi_gconfig -t job-config core.free_blocks_pct_threshold_lo threshold) and this job does not free disk space. Free up some space (e.g. run TreeDelete, SnapshotDelete) then try again.: No space left on device

If you DO NOT SEE the error message, above, go to Step 3 below.

If you DO SEE this error message, proceed with step 2a below.

Check again to see if there is any data you can delete to free up some space on the full node pool. This would include checking for any large snapshots and also checking for any large system or audit files with the following commands:

Isilon-28# du -sh /ifs/.ifsvar/audit/logs
 
Islon-28# du -sh /ifs/.ifsvar

If you can delete enough data, try running the SmartPools job again.

If there is absolutely no data which can be deleted, the recommended mitigation step would be as follows:

Modify the Default-File Pool Policy, above, to write to the less full node pool.
Identify a data path on the full node pool which includes most of the data.
In a screen session, run:

# isi filepool apply -r <data path>   to manually move data under a certain path

i.e.

# isi filepool apply -r /ifs/data/win_data/test_data

verify the job is running:

# ps auwx | grep apply
root   45237   98.1  0.0 102268  61176  0  R+   13:34          0:35.04 /usr/libexec/isilon/isi /usr/bin/isi filepool apply -r /ifs/

Monitor the capacity. Once the full node pool is under 96%, then start all over with step two, above.

Run an AutoBalanceLin job for a few hours, and monitor space. (Unlike AutoBalance, which does a full tree walk before moving any data, AutoBalanceLin restripes data immediately)

# isi job start autobalancelin -p 1 --policy medium

Almost immediately, observe that the data shifts around between the node pools and the full node pool should free up in disk space.

Note: For this step, AutoBalanceLin does not have to run to completion. Monitor the AutoBalanceLin job until the goal of cleaning up the full node pool is achieved and then cancel the job. For example, you can cancel the job once the full node pool is down to 85% of capacity.

Note: Monitor cluster utilization and confirm that other jobs are canceled or pause if space is being adversely affected.

Continue to monitor space every hour or so until space is at a sufficient level using the following command:

# isi stat -p -v

Note: Again, It is not advisable to let AutoBalanceLin run to completion. At a certain point, the job may shift data in a new direction, and it may start to produce undesirable results. For example, it reverses the data movement, possibly leading to the other node pool nearing full capacity as the previous pool empties. Only run AutobalanceLIN for a few hours, or until the space is cleaned out, then cancel once the goal is achieved.

Once the cluster has achieved relief in space on the full node pool, cancel the AutoBalanceLin job:

# isi job cancel autobalancelin

Note: for optimal results, protection levels between the node pools should be of equal value.

Additional Information

Lab Testing Results:

Two node pools, equal protection levels
This is before:

Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             29.9T (35%)         35.2G (2%)
Avail:            55.3T (65%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |881.6|    0|881.6|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
  6|10.5.80.191    |-A-- |    0|    0|    0|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
  7|10.5.80.192    | OK  |    0|    0|    0|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK |110.2|    0|110.2|29.9T/85.2T( 35%)|35.2G/ 2.2T(  2%)

Node Pool Name: x410_35tb_800gb-ssd_64gbProtection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             112.8T (125.3T Raw) 2.9T (2.9T Raw)
VHS Size:         12.5T
Used:             5.6T (5%)           7.9G (< 1%)
Avail:            107.2T (95%)        2.9T (> 99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.5.80.186    | OK  | 2.9M|82.8M|85.6M| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
  2|10.5.80.187    |-A-- | 104k|38.8k| 143k| 1.4T/30.6T(  5%)| 1.9G/ 738G(< 1%)
  3|10.5.80.188    | OK  |881.6|    0|881.6| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
  4|10.5.80.189    | OK  |    0|25.8k|25.8k| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_35tb_800gb-ssd|-M---| 371k|10.4M|10.7M| 5.6T/112.8T(  5%)| 7.9G/ 2.9T(< 1%)
  _64gb            |     |     |     |     |                 |




X410-2# date
Thu Jun 14 16:53:29 CDT 2018


one filepool policy set to default any:any


X410-2# isi job start autobalancelin -p 1 --policy medium
Started job [7159]


in as little as 30 minutes you will see data shift between the two pools, i.e. our first node pool, below, dropped from 32% full to 29%


X410-2# date
Thu Jun 14 17:24:20 CDT 2018



Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             27.7T (33%)         34.3G (2%)
Avail:            57.5T (67%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  | 1.1k|25.8k|26.9k| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
  6|10.5.80.191    |-A-- | 1.1k| 1.2M| 1.2M| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
  7|10.5.80.192    | OK  |28.6k| 5.2k|33.7k| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK | 3.8k| 152k| 156k|27.7T/85.2T( 33%)|34.3G/ 2.2T(  2%)

Node Pool Name: x410_35tb_800gb-ssd_64gbProtection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             112.8T (125.3T Raw) 2.9T (2.9T Raw)
VHS Size:         12.5T
Used:             7.6T (7%)           8.8G (< 1%)
Avail:            105.2T (93%)        2.9T (> 99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.5.80.186    | OK  |37.9k| 279k| 316k| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
  2|10.5.80.187    |-A-- | 1.4M|34.8M|36.2M| 1.9T/30.6T(  6%)| 2.2G/ 738G(< 1%)
  3|10.5.80.188    | OK  | 130k|30.9k| 161k| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
  4|10.5.80.189    | OK  |    0|    0|    0| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_35tb_800gb-ssd|-M---| 198k| 4.4M| 4.6M| 7.6T/112.8T(  7%)| 8.8G/ 2.9T(< 1%)
  _64gb            |     |     |     |     |                 |

Continue to monitor every hour or so until the diskspace is at a sufficient level.

Note: Do not let AutoBalanceLin run to completion. The job shifts data in the opposite direction which can start to produce undesirable results. The disk space consumed can reverse, possibly leading to a near-full node pool again. Only run the job for a few hours.

One hour point:

X410-2# date
Thu Jun 14 17:54:30 CDT 2018



Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             25.2T (30%)         33.9G (2%)
Avail:            60.0T (70%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |881.6|20.6k|21.5k| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
  6|10.5.80.191    |-A-- |    0|    0|    0| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
  7|10.5.80.192    | OK  | 2.2k| 216k| 218k| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK |385.7|29.5k|29.9k|25.2T/85.2T( 30%)|33.9G/ 2.2T(  2%)


X410-2# date
Thu Jun 14 18:54:43 CDT 2018




Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             21.6T (25%)         26.8G (1%)
Avail:            63.6T (75%)         2.1T (99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |22.9k| 1.4M| 1.5M| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
  6|10.5.80.191    |-A-- |881.6| 231k| 232k| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
  7|10.5.80.192    | OK  |    0|    0|    0| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK | 3.0k| 210k| 213k|21.6T/85.2T( 25%)|26.8G/ 2.2T(  1%)

Space is sufficiently cleaned up. The AutoBalanceLIn job can be canceled since the wanted results have been achieved.

Affected Products

Isilon X400

Article Number: 000009283

Article Type: How To

Last Modified: 23 Jun 2026

Version: 9

Check if your device is covered by Support Services.

PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing disk space on a node pool have been exhausted. ...

Instructions

Additional Info

Affected Products

Instructions

Steps:

Additional Information

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing disk space on a node pool have been exhausted. ... View More View Less

Detailed Article

Instructions

Additional Info

Affected Products

Instructions

Steps:

Additional Information

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing disk space on a node pool have been exhausted. ...