PowerScale OneFS: NFS Hitting EIO when Writing with Software Journal Mirroring (SJM) Enabled
Summary: Network File System (NFS) clients writing to PowerScale OneFS on version 9.11 may experience rare write operation failures that manifest as Input or Output Errors (EIO).
Symptoms
There is a new OneFS 9.11 feature called Software Journal Mirroring (SJM) combined with a OneFS 9.7 feature known as Direct Write (NCIO).
For more information about SJM, see: https://infohub.delltechnologies.com/en-uk/l/powerscale-onefs-smartflash/software-journal-mirroring/
For more information about NCIO, see: https://infohub.delltechnologies.com/en-au/l/powerscale-best-practices-for-semiconductor-eda-design-environments/direct-read-and-direct-write/
The issue is expected to be rare (10 in a billion operations). The following error in the PowerScale node's messages log confirms that the issue occurred at the timestamp. The cluster-side timestamp of an error must match the client-side timestamp of the error:
2026-03-03T17:00:28.344642-05:00 isilon-2(id2) /boot/kernel.amd64/kernel: [txn_participant.c:5806](pid 82988="kt: dxt17")(tid=106414) txn_p_dl_block_budrecs() txn(0x2:0x133bcad0d) requested from devid:2 not found
Also, determine if the EIO errors began after upgrading to OneFS 9.11, and errors such as the following began appearing in the log files:
# grep txn_p_dl_ /var/log/messages | cut -d":" -f 2- | cut -d" " -f 8 | sort | uniq -c
5112 txn_p_dl_block_budrecs()
3672 txn_p_dl_deltas_budrecs()
# grep txn_p_dl_ /var/log/messages | cut -d":" -f 2- | cut -d" " -f 8 | sort | uniq -c
5200 txn_p_dl_block_budrecs()
3712 txn_p_dl_deltas_budrecs()
Cause
For the issue to apply, both SJM and NCIO must be enabled on the affected cluster. These features cause a race condition where a thread can lose access to a necessary resource.
To determine if SJM is enabled on a cluster, run the following command:
isi storagepool nodepools list -v | egrep "Name|SJM"
To determine if NCIO is enabled, run the following commands. Both commands return '1' on all nodes when NCIO is enabled.
isi_for_array -s sysctl efs.lbm.ncio.write.enable isi_for_array -s sysctl efs.bam.bsw_send_direct
Examples:
(Confirms SJM is enabled) Isilon-1# isi storagepool nodepools list -v | egrep "Name|SJM" Name: f200_3.8tb-ssd_48gb SJM Enabled: Yes (Confirms NCIO is enabled. Both sets of output return '1' for all nodes, indicating NCIO is enabled) Isilon-1# isi_for_array -s sysctl efs.lbm.ncio.write.enable Isilon-1: efs.lbm.ncio.write.enable: 1 Isilon-2: efs.lbm.ncio.write.enable: 1 Isilon-3: efs.lbm.ncio.write.enable: 1 Isilon-1# isi_for_array -s sysctl efs.bam.bsw_send_direct Isilon-1: efs.bam.bsw_send_direct: 1 Isilon-2: efs.bam.bsw_send_direct: 1 Isilon-3: efs.bam.bsw_send_direct: 1
Resolution
The workaround is to disable either SJM or NCIO. It is not necessary to disable both.
Disabling SJM reduces data protection and redundancy of the File System Journal, as follows:
Every node in an SJM-enabled node pool is dynamically assigned a unique Buddy, and the backend network connection between nodes is optimized for low latency bulk data flow. SJM’s automatic recovery scheme can use a Buddy journal’s mirrored contents to re-form the Primary node’s journal in the event of a failure, avoiding the costly process of SmartFailing the node. This recovery scheme, known as SyncBack, can also be applied manually if a failing journal device must be physically replaced.
Source: https://infohub.delltechnologies.com/en-uk/l/powerscale-onefs-smartflash/software-journal-mirroring/
Disabling NCIO may limit or reduce write performance by roughly 20% performance on flash nodes.
To disable SJM. This must be done per node pool:
isi storagepool nodepools modify <nodepool name> --sjm-enabled=false
To disable NCIO:
isi_sysctl_cluster efs.lbm.ncio.write.enable=0
isi_sysctl_cluster efs.bam.bsw_send_direct=0
Disabling these features may impact the cluster. This should only be applied if cluster workflow is being impacted. Contact Dell Technical Support if there are questions or concerns.