PowerScale OneFS 9.10: Rare Performance Issues when Running a SnapshotDelete job

Summary: On clusters upgraded to OneFS 9.10 or 9.11, performance issues may be experienced when running a SnapshotDelete job if there are multiple storage pools.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Clusters with two or more node pools that were upgraded to OneFS 9.10 or later may experience performance issues whenever a SnapshotDelete job is running. Pausing the SnapshotDelete job brings immediate relief, but the issue returns once the job is resumed. 

On Clusters with snapshots with long expiration dates, the issue may not be apparent until several weeks or months after the OneFS upgrade was completed. 

Logs and Hangdumps show the job engine (isi_job_d) SnapshotDelete job thread holding a LIN lock with a stack trace similar to this example:

77886 isi_job_d:
...
  thread 100637: je_worker_main at 0xfffffe8b55ea95c0 in state "running":
    On cpu 5 for 3 ticks
    Stack: --------------------------------------------------
    kernel:btree_leaf_check_prefetch+0xde
    kernel:btree_leaf_get_entry+0x349
    kernel:stf_is_fake_entry+0x41
    kernel:stf_iterate_block+0x66
    kernel:ifs_snap_get_lins_helper+0xac
    kernel:_sys_ifs_snap_get_lins+0x279
    kernel:amd64_syscall+0x7b0
    --------------------------------------------------

    Cause

    OneFS 9.10 introduces Illogical Logical iNodes (LINs) to the Snapshot Tracking Files (STF). This was added to support a new feature, MetadataIQ. An STF is a special file type with several unique characteristics and is involved in the full snapshot life cycle, including the creation, storing, changing, and deletion of snapshots.

    When data is migrated between different pools, the illogical LINs are added to the STF and can gradually build up. Performance issues occur when snapshots are expired and deleting, and there are too many illogical LINs in the STF of a snapshot.

    How to determine whether a cluster is at risk for this issue?
    Clusters which meet the following criteria are at higher risk of experiencing this issue if they are upgraded to OneFS 9.10 or 9.11. 

    • SnapshotIQ is licensed and enabled. Snapshots are being created and expired on the cluster.
    • The Cluster contains multiple node pools. 

    Resolution

    Permanent solution: 
    Upgrade to one of these OneFS versions or later which includes the fix:

    • OneFS 9.10.1.4 PSP-4686 MR:[9.10.1.4_GA-MR][Multiple Userspace and Kernel Fixes](October 2025)
    • OneFS 9.11.0.5 PSP-4681 MR:[9.11.0.5_GA-MR][Multiple Userspace and Kernel Fixes](September2025)

    Workaround:
    Until a permanent solution is applied, the following workaround should be used:

     Apply the following setting change to disable illogical LINs cluster wide. 

    isi_sysctl_cluster efs.snapshot.stf_populate_illogical_lin_enabled=0

     

    NOTE: With illogical LINs disabled, the node pool analytics provided by MetaDataIQ gets stale over time. Otherwise, the rest of the information provided by MetaDataIQ is still usable. On clusters with illogical LINs disabled, a manual resync can be done if node pool information requires updating.

    On clusters that have upgraded to OneFS 9.10 and are experiencing performance issues:
    Cancel, and disable the SnapshotDelete job to avoid a Data Unavailability (DU) situation. Then contact Dell Technical Support for assistance with removing the Snapshots containing illogical LINs.

    To cancel a running SnapshotDelete job:
    isi job cancel snapshotdelete

    To disable the SnapshotDelete job:

    isi job types modify snapshotdelete --enabled=false

     

    NOTE: Leaving the SnapshotDelete job disabled for too long can cause low disk space capacity issues. Dell Technical Support must be contacted as soon as possible to assist with removing the Snapshots containing illogical LINs manually before the SnapshotDelete job is reenabled. 

    Additional Information

     

      Article Properties
      Article Number: 000337012
      Article Type: Solution
      Last Modified: 07 Nov 2025
      Version:  6
      Find answers to your questions from other Dell users
      Support Services
      Check if your device is covered by Support Services.