Nodes may panic with FAILED ASSERTION when fail to decode quota or snapshot attributes from extension blocks after upgrading to OneFS 9.2.0.0 or 9.2.1.0

Summary: Nodes may panic with FAILED ASSERTION when fail to decode quota or snapshot attributes from extension blocks after upgrading to OneFS 9.2.0.0 or 9.2.1.0.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Nodes may panic with the same or similar panic stack:

panic @ time 1622753124.594, thread 0xfffffe86de1deb00: Assertion Failure
time = 1622753124
cpuid = 3, TSC = 0x114d3cfdc35dc80
Panic occurred in module kernel loaded at 0xffffffff80200000:
        
Stack: --------------------------------------------------
kernel:isi_assert_halt+0x2e
kernel:ifm_get_quota_gov+0x188
kernel:quota_gov_need_update+0x141
kernel:quota_scan_file+0x188
kernel:_sys_ifs_quota_scan_file+0x1fd
kernel:amd64_syscall+0x380
--------------------------------------------------
*** FAILED ASSERTION decode @ /b/mnt/src/sys/ifs/ifm/ifm_dinode.c:3537: failed to decode quota governance attribute for lin 1:1797:9acf. On-disk format change?

 

Or  

panic @ time 1625264072.183, thread 0xfffffe8688a2c080: Assertion Failure
time = 1625264072
cpuid = 0, TSC = 0x395c9046b59119
Panic occurred in module kernel loaded at 0xffffffff80200000:

Stack: --------------------------------------------------
kernel:isi_assert_halt+0x2e
kernel:ifm_getsnapids+0xdd
kernel:validate_inode_snapid+0x4ad
kernel:revalidate_inode_contents+0x2cb
kernel:bam_update_inode_hint+0x14fd
kernel:bam_vget_stream_valid_pref_hint+0x11b
kernel:bam_vget_valid+0x21
kernel:bam_getparents+0x1ae
kernel:_sys_pctl2_lin_get_path_plus+0x6c3
kernel:amd64_syscall+0x380
--------------------------------------------------
*** FAILED ASSERTION num_snapids * sizeof(snapids->snapids[0]) == inattr_size(attr) @ /b/mnt/src/sys/ifs/ifm/ifm_dinode.c:2598:


The panic may happen when a QuotaScan, SnapshotDelete, SmartPools, or SmartPoolsTree job runs.

We may also see input/output error when reading or modifying attributes of some files which have extension blocks.

 

Cause

Inode version is upgraded to Version 8, and extension block version should be upgraded as well after upgrading to OneFS 9.2.0.0 (Build B_9_2_0_002) or 9.2.1.0 (Build B_9_2_1_002).

However, due to defect PSCALE-107686, some of the extension blocks are not upgraded and could not be decoded correctly.

Resolution

! Important update on Jun 21 2021!
Upgrading to OneFS 9.2.x is live, with below RUP released.
9.2.1.1_GA-RUP_2021-06_PSP-1313.tgz
9.2.0.1_GA-RUP_2021-06_PSP-1322.tgz

For customers who are planning an upgrade to 9.2.x, upgrade to 9.2.1.1 or 9.2.0.1 bundled with above RUPs. 

For customers who have upgraded to OneFS 9.2.0.0 or 9.2.1.0, we suggest disabling the following jobs and keep them disabled until IntegrityScan can be run:
    - AutoBalance
    - AutoBalanceLin
    - FilePolicy
    - FlexProtect
    - FlexProtectLin
    - MultiScan
    - MediaScan
    - SetProtectPlus
    - ShadowStoreProtect
    - SmartPools
    - SmartPoolsTree
    - QuotaScan
    - Upgrade
    - Collect
    
Before disabling these jobs, get a copy of current job status (in case some jobs were disabled previously for other reasons, we do not want to enable them):

# isi job types list


Disable the above jobs:

# for jobName in {AutoBalance,AutoBalanceLin,FilePolicy,FlexProtect,FlexProtectLin,MultiScan,MediaScan,SetProtectPlus,ShadowStoreProtect,SmartPools,SmartPoolsTree,QuotaScan,Upgrade,Collect}; do isi job types modify $jobName --enabled false --force; done
    

If nodes are panicking, the job engine can be disabled until the RUP is installed:

# isi services -a isi_job_d disable

 


Once the RUP is installed, then the job engine can be enabled again:

# isi services -a isi_job_d enable


The above listed jobs must remain disabled now. Engage Isilon Support for further assistance.


Important Notes:
We cannot repair failed devices after disabling Flexprotect and FlexProtectLin. If any down device is found (check with command 'isi_group_info"), contact ISILON Support as soon as possible.

We cannot migrate data between Tiers or Nodepools since Smartpools job is disabled. Attention is required to monitor the capacity usage of each nodepool.

If any of them is approaching to full, contact ISILON Support as soon as possible.

Article Properties
Article Number: 000188137
Article Type: Solution
Last Modified: 27 Aug 2022
Version:  15
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.