PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing diskspace on a nodepool have been exhausted. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

Note: The following procedure requires the removal of existing file pool policies and striping of data across all nodes regardless of the workflow that data belongs to.
Without the File Pool Policies, no management of data between the pools or tiers occurs.
Be sure that the impact of this procedure is fully understood as it may lead to performance degradation.
Only perform this is as a last-ditch effort after all other options for resolving capacity issues have been attempted.

It is widely believed that AutoBalance and AutoBalanceLin only balance data within node pools and not across node pools. Also, it is believed only smartpools/smartpoolstree can move data between two node pools.
Testing on OneFS 8.0 and above prove this is not entirely true. If the cluster only has the default File Pool policy of 'anywhere:anywhere', AutoBalanceLin and AutoBalance moves data across multiple node pools.

This should only be used as an emergency workaround for clusters that have one full node pool. This process moves data quickly off the full node pool.

Question: When would one want to use the following procedure?

Answer: This procedure would be used when the following conditions exist:

a. The cluster contains multiple node pools, and one or more of the node pools is 100% full.
b. There is an immediate requirement to free up disk space on a full node pool.
c. The exact organization of the data is not an immediate concern.

Steps:

   1) Make note of, then delete all existing file pool policies except the 'default' 'any:any' file pool policy which is shipped with cluster.

         Before proceeding with this step:

               a) Record the current file-pool policy configuration before removing the policies. If time permits, a full log gather is recommended.

               b) By default, Isilon clusters are configured with the Default- File Pool Policy set to write data to 'anywhere:anywhere'. Verify that the Default-File Pool Policy is reverted to these default settings before proceeding.

Example: Default File Pool Policy. Observe that the Storage Targets are set to 'anywhere'. 

# isi filepool default-policy view
          Set Requested Protection: default
               Data Access Pattern: concurrency
                  Enable Coalescer: Yes
                    Enable Packing: No
               Data Storage Target: anywhere
                 Data SSD Strategy: metadata
           Snapshot Storage Target: anywhere
             Snapshot SSD Strategy: metadata
                        Cloud Pool: -
         Cloud Compression Enabled: -
          Cloud Encryption Enabled: -
              Cloud Data Retention: -
Cloud Incremental Backup Retention: -
       Cloud Full Backup Retention: -
               Cloud Accessibility: -
                  Cloud Read Ahead: -
            Cloud Cache Expiration: -
         Cloud Writeback Frequency: -
      Cloud Archive Snapshot Files: -
                                ID: -

2) Run a SmartPools job to apply new directory markings:

# isi job start smartpools -p 1 --policy medium

Note: Expect the Smartpools job to complete faster than usual with only the anywhere:anywhere Default File Pool Policy in place.

3) Run an AutoBalanceLin job for a few hours, and monitor space. (Note: unlike AutoBalance, which does a full tree walk before moving any data, AutoBalanceLin restripes data immediately)

# isi job start autobalancelin -p 1 --policy medium

Almost immediately, observe that the data shifts around between the node pools and the full node pool should free up in disk space.

Note: For this step, AutoBalanceLin does not have to run to completion. Monitor the AutoBalanceLin job until the goal of cleaning up the full node pool is achieved, and then cancel the job. For example, you can cancel the job once the full node pool is down to 85% of capacity.

Note: Monitor cluster utilization and confirm that other jobs are cancelled or pause if space is being adversely affected.

Continue to monitor space every hour or so until space is at a sufficient level using the following command:

# isi stat -p -v

Note: Again, It is not advisable to let AutoBalanceLin run to completion. At a certain point, the job may shift data in a new direction, and it may start to produce undesirable results. For example, it reverses the data movement, possibly leading to the other node pool nearing full capacity as the previous pool empties. Only run AutobalanceLIN for a few hours, or until the space is cleaned out, then cancel once the goal is achieved.

Once the cluster has achieved relief in space on the full node pool, cancel the AutoBalanceLin job:

# isi job cancel autobalancelin

Note: for optimal results, protection levels between the node pools should be of equal value.

Additional Information

Lab Testing Results

Two node pools, equal protection levels

This is before:

Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             29.9T (35%)         35.2G (2%)
Avail:            55.3T (65%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |881.6|    0|881.6|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
  6|10.5.80.191    |-A-- |    0|    0|    0|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
  7|10.5.80.192    | OK  |    0|    0|    0|10.0T/31.5T( 32%)|11.7G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK |110.2|    0|110.2|29.9T/85.2T( 35%)|35.2G/ 2.2T(  2%)

Node Pool Name: x410_35tb_800gb-ssd_64gbProtection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             112.8T (125.3T Raw) 2.9T (2.9T Raw)
VHS Size:         12.5T
Used:             5.6T (5%)           7.9G (< 1%)
Avail:            107.2T (95%)        2.9T (> 99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.5.80.186    | OK  | 2.9M|82.8M|85.6M| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
  2|10.5.80.187    |-A-- | 104k|38.8k| 143k| 1.4T/30.6T(  5%)| 1.9G/ 738G(< 1%)
  3|10.5.80.188    | OK  |881.6|    0|881.6| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
  4|10.5.80.189    | OK  |    0|25.8k|25.8k| 1.4T/31.5T(  4%)| 2.0G/ 738G(< 1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_35tb_800gb-ssd|-M---| 371k|10.4M|10.7M| 5.6T/112.8T(  5%)| 7.9G/ 2.9T(< 1%)
  _64gb            |     |     |     |     |                 |







X410-2# date
Thu Jun 14 16:53:29 CDT 2018


one filepool policy set to default any:any


X410-2# isi job start autobalancelin -p 1 --policy medium
Started job [7159]


in as little as 30 minutes you will see data shift between the two pools, i.e. our first node pool, below, dropped from 32% full to 29%


X410-2# date
Thu Jun 14 17:24:20 CDT 2018





Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             27.7T (33%)         34.3G (2%)
Avail:            57.5T (67%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  | 1.1k|25.8k|26.9k| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
  6|10.5.80.191    |-A-- | 1.1k| 1.2M| 1.2M| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
  7|10.5.80.192    | OK  |28.6k| 5.2k|33.7k| 9.2T/31.5T( 29%)|11.4G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK | 3.8k| 152k| 156k|27.7T/85.2T( 33%)|34.3G/ 2.2T(  2%)

Node Pool Name: x410_35tb_800gb-ssd_64gbProtection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             112.8T (125.3T Raw) 2.9T (2.9T Raw)
VHS Size:         12.5T
Used:             7.6T (7%)           8.8G (< 1%)
Avail:            105.2T (93%)        2.9T (> 99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.5.80.186    | OK  |37.9k| 279k| 316k| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
  2|10.5.80.187    |-A-- | 1.4M|34.8M|36.2M| 1.9T/30.6T(  6%)| 2.2G/ 738G(< 1%)
  3|10.5.80.188    | OK  | 130k|30.9k| 161k| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
  4|10.5.80.189    | OK  |    0|    0|    0| 1.9T/31.5T(  6%)| 2.2G/ 738G(< 1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_35tb_800gb-ssd|-M---| 198k| 4.4M| 4.6M| 7.6T/112.8T(  7%)| 8.8G/ 2.9T(< 1%)
  _64gb            |     |     |     |     |                 |

Continue to monitor every hour or so until the diskspace is at a sufficient level.
Note: Do not let AutoBalanceLin run to completion. The job shifts data in the opposite direction which can start to produce undesirable results. The disk space consumed can reverse, possibly leading to a near-full node pool again. Only run the job for a few hours.

One hour point:

X410-2# date
Thu Jun 14 17:54:30 CDT 2018





Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             25.2T (30%)         33.9G (2%)
Avail:            60.0T (70%)         2.1T (98%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |881.6|20.6k|21.5k| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
  6|10.5.80.191    |-A-- |    0|    0|    0| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
  7|10.5.80.192    | OK  | 2.2k| 216k| 218k| 8.4T/31.5T( 27%)|11.3G/ 738G(  2%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK |385.7|29.5k|29.9k|25.2T/85.2T( 30%)|33.9G/ 2.2T(  2%)


X410-2# date
Thu Jun 14 18:54:43 CDT 2018





Node Pool Name: x410_archive          Protection:        +2d:1n
Pool Storage:     HDD                 SSD Storage
Size:             85.2T (94.6T Raw)   2.2T (2.2T Raw)
VHS Size:         9.4T
Used:             21.6T (25%)         26.8G (1%)
Avail:            63.6T (75%)         2.1T (99%)

                           Throughput (bps)  HDD Storage      SSD Storage
Name               Health|  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  5|10.5.80.190    | OK  |22.9k| 1.4M| 1.5M| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
  6|10.5.80.191    |-A-- |881.6| 231k| 232k| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
  7|10.5.80.192    | OK  |    0|    0|    0| 7.2T/31.5T( 23%)| 8.9G/ 738G(  1%)
-------------------+-----+-----+-----+-----+-----------------+-----------------
x410_archive       |  OK | 3.0k| 210k| 213k|21.6T/85.2T( 25%)|26.8G/ 2.2T(  1%)

Space is sufficiently cleaned up. The AutoBalanceLIn job can be cancelled since the wanted results have been achieved.

Affected Products

Isilon X400

Article Number: 000009283

Article Type: How To

Last Modified: 26 May 2025

Version: 5

Check if your device is covered by Support Services.

PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing diskspace on a nodepool have been exhausted. ...

Instructions

Additional Information

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

PowerScale: Using AutoBalanceLin to quickly move data off of a full node pool

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing diskspace on a nodepool have been exhausted. ... View More View Less

Detailed Article

Instructions

Additional Info

Affected Products

Instructions

Additional Information

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Summary: This article describes how to use the AutoBalanceLin job to quickly free space if a single node pool is full or almost at 100% capacity. This procedure should only be used if all other methods of freeing diskspace on a nodepool have been exhausted. ...