PowerScale: Isilon: Gen6: F810 nodes with outdated Mellanox5-EN0 firmware may unexpectedly fail to retain their file system journal during reboots
Summary: An issue with earlier versions of the firmware on the Mellanox5-EN0 device, which is found exclusively in F810 nodes, can force these nodes to unexpectedly cold-reboot when a warm reboot is requested or triggered by a panic or other error. If the peer node is also cold rebooted before the affected node can retrieve its journal, the journal on both nodes may be lost. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
The node may show an IERR on the backend compression NIC device in the BMC SEL log during reboot, followed by a cold reboot message. When the node reboots, you may see messages indicating the node's journal is invalid, and had to be retrieved from the peer node. If retrieval fails (because the peer node also cold rebooted), boot stops with an error message indicating an invalid journal. You may also see messages similar to the following in the node's messages log:
2022-12-16T23:25:53-05:00 <3.4> CLUSTER-3(id3) isi_hwmon[2347]: (mlx5_health_v1) WARNING: MLX5 device mce0 firmware unhealthy.
2022-12-16T23:25:53-05:00 <3.3> CLUSTER-3(id3) isi_hwmon[2347]: (mlx5_health_v1) ERROR: MLX5 device unhealthy: mce0 -- {'irisc_index': 0, 'assert_return_address': 9413196L, 'assert_exit_pointer': 8455212L, 'firmware_version': 270012340L, 'miss_counter': 3L, 'syndrome': 7, 'hardware_device_id': 525L, 'assert_var4': 0L, 'assert_var3': 0L, 'assert_var2': 0L, 'assert_var1': 10338568L, 'assert_var0': 1L, 'extended_syndrome': '\x90@'}
2022-12-16T23:25:53-05:00 <3.4> CLUSTER-3(id3) isi_hwmon[2347]: (mlx5_health_v1) WARNING: MLX5 device mce1 firmware unhealthy.
2022-12-16T23:25:53-05:00 <3.3> CLUSTER-3(id3) isi_hwmon[2347]: (mlx5_health_v1) ERROR: MLX5 device unhealthy: mce1 -- {'irisc_index': 0, 'assert_return_address': 9413196L, 'assert_exit_pointer': 8455212L, 'firmware_version': 270012340L, 'miss_counter': 3L, 'syndrome': 7, 'hardware_device_id': 525L, 'assert_var4': 0L, 'assert_var3': 0L, 'assert_var2': 0L, 'assert_var1': 10338568L, 'assert_var0': 1L, 'extended_syndrome': '\x90@'}
Cause
An issue with versions older than 16.28.1002+EMC0000000017 of the firmware for the Mellanox5-EN0 device, which is found exclusively in F810 nodes, can force these nodes to unexpectedly cold-reboot when a warm reboot is requested or triggered by a panic or other error. Under normal circumstances, the node recovers its journal from the peer node copy. However, if the peer node is also unable to retain journal integrity, the journal may be lost.
Resolution
This issue was fixed in Mellanox5-EN0 firmware 16.28.1002+EMC0000000017, which was released in April 2021 as part of Node Firmware Package versions 11.1.3 and 10.3.6. Any customer still running an older firmware version than the above on any F810 node should install the latest Node Firmware Package version and schedule a node firmware update on their cluster ASAP.
Affected Products
Isilon, Isilon F810Article Properties
Article Number: 000207184
Article Type: Solution
Last Modified: 19 Feb 2026
Version: 4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.