Nodes may panic with FAILED ASSERTION when fail to decode quota or snapshot attributes from extension blocks after upgrading to OneFS 9.2.0.0 or 9.2.1.0
Summary: Nodes may panic with FAILED ASSERTION when fail to decode quota or snapshot attributes from extension blocks after upgrading to OneFS 9.2.0.0 or 9.2.1.0.
Symptoms
Nodes may panic with the same or similar panic stack:
panic @ time 1622753124.594, thread 0xfffffe86de1deb00: Assertion Failure
time = 1622753124
cpuid = 3, TSC = 0x114d3cfdc35dc80
Panic occurred in module kernel loaded at 0xffffffff80200000:
Stack: --------------------------------------------------
kernel:isi_assert_halt+0x2e
kernel:ifm_get_quota_gov+0x188
kernel:quota_gov_need_update+0x141
kernel:quota_scan_file+0x188
kernel:_sys_ifs_quota_scan_file+0x1fd
kernel:amd64_syscall+0x380
--------------------------------------------------
*** FAILED ASSERTION decode @ /b/mnt/src/sys/ifs/ifm/ifm_dinode.c:3537: failed to decode quota governance attribute for lin 1:1797:9acf. On-disk format change?
Or
panic @ time 1625264072.183, thread 0xfffffe8688a2c080: Assertion Failure
time = 1625264072
cpuid = 0, TSC = 0x395c9046b59119
Panic occurred in module kernel loaded at 0xffffffff80200000:
Stack: --------------------------------------------------
kernel:isi_assert_halt+0x2e
kernel:ifm_getsnapids+0xdd
kernel:validate_inode_snapid+0x4ad
kernel:revalidate_inode_contents+0x2cb
kernel:bam_update_inode_hint+0x14fd
kernel:bam_vget_stream_valid_pref_hint+0x11b
kernel:bam_vget_valid+0x21
kernel:bam_getparents+0x1ae
kernel:_sys_pctl2_lin_get_path_plus+0x6c3
kernel:amd64_syscall+0x380
--------------------------------------------------
*** FAILED ASSERTION num_snapids * sizeof(snapids->snapids[0]) == inattr_size(attr) @ /b/mnt/src/sys/ifs/ifm/ifm_dinode.c:2598:
The panic may happen when a QuotaScan, SnapshotDelete, SmartPools, or SmartPoolsTree job runs.
We may also see input/output error when reading or modifying attributes of some files which have extension blocks.
Cause
Inode version is upgraded to Version 8, and extension block version should be upgraded as well after upgrading to OneFS 9.2.0.0 (Build B_9_2_0_002) or 9.2.1.0 (Build B_9_2_1_002).
However, due to defect PSCALE-107686, some of the extension blocks are not upgraded and could not be decoded correctly.
Resolution
! Important update on Jun 21 2021!
Upgrading to OneFS 9.2.x is live, with below RUP released.
9.2.1.1_GA-RUP_2021-06_PSP-1313.tgz
9.2.0.1_GA-RUP_2021-06_PSP-1322.tgz
For customers who are planning an upgrade to 9.2.x, upgrade to 9.2.1.1 or 9.2.0.1 bundled with above RUPs.
For customers who have upgraded to OneFS 9.2.0.0 or 9.2.1.0, we suggest disabling the following jobs and keep them disabled until IntegrityScan can be run:
- AutoBalance
- AutoBalanceLin
- FilePolicy
- FlexProtect
- FlexProtectLin
- MultiScan
- MediaScan
- SetProtectPlus
- ShadowStoreProtect
- SmartPools
- SmartPoolsTree
- QuotaScan
- Upgrade
- Collect
Before disabling these jobs, get a copy of current job status (in case some jobs were disabled previously for other reasons, we do not want to enable them):
# isi job types list
Disable the above jobs:
# for jobName in {AutoBalance,AutoBalanceLin,FilePolicy,FlexProtect,FlexProtectLin,MultiScan,MediaScan,SetProtectPlus,ShadowStoreProtect,SmartPools,SmartPoolsTree,QuotaScan,Upgrade,Collect}; do isi job types modify $jobName --enabled false --force; done
If nodes are panicking, the job engine can be disabled until the RUP is installed:
# isi services -a isi_job_d disable
Once the RUP is installed, then the job engine can be enabled again:
# isi services -a isi_job_d enable
The above listed jobs must remain disabled now. Engage Isilon Support for further assistance.
Important Notes:
We cannot repair failed devices after disabling Flexprotect and FlexProtectLin. If any down device is found (check with command 'isi_group_info"), contact ISILON Support as soon as possible.
We cannot migrate data between Tiers or Nodepools since Smartpools job is disabled. Attention is required to monitor the capacity usage of each nodepool.
If any of them is approaching to full, contact ISILON Support as soon as possible.