Event: Node has Recovered from a Panic. Information about Panic is Recorded in File: var tmp panic

Summary: This article explains how Users and Technical Support should manage "Node has Recovered from a panic" event or messages. Review all portions of the article before acting.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

EVENT
You received an event notification indicating one or more nodes recovered from a panic. Information about the panic is recorded in a file located under /var/tmp/ on the affected nodes.
Example:
4.3394   03/12 18:02 W    4    53125     Node 4 has recovered from a panic. Info about panic is recorded in file: /var/tmp/panic.1615590175

Cause

The exact causes of a Node panic can vary, but typical causes may include:
  • Hardware failure
  • Software code failure
  • Misconfiguration
Analysis of the cluster logs must be performed with PowerScale Support for the exact cause of the panic.

Resolution

To begin troubleshooting the issue, first confirm that the node has recovered from the panic event and is not down or offline.*

To Troubleshoot, open an SSH connection to the node and log in using the "root" account.
Run the following command to confirm the node rejoined the cluster:
# isi status
The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column does not display "D" (down):
                   Health  Throughput (bps)  HDD Storage      SSD Storage
ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.16.141.226  | OK  | 553M| 3.2M| 557M|61.9T/ 106T( 59%)|        L3:  1.5T
  2|10.16.141.227  | OK  | 481M| 96.0| 481M|62.2T/ 106T( 59%)|        L3:  1.5T
  3|10.16.141.228  | OK  | 372k| 332k| 704k|62.3T/ 106T( 59%)|        L3:  1.5T
  4|10.16.141.229  | OK  |10.8M| 941k|11.7M|62.6T/ 106T( 59%)|        L3:  1.5T
  5|10.16.141.230  | OK  | 9.4M| 393k| 9.8M|62.6T/ 106T( 59%)|        L3:  1.5T
  6|10.16.141.231  | OK  | 7.3M|256.0| 7.3M|63.4T/ 106T( 60%)|        L3:  1.5T
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals:          | 1.1G| 4.9M| 1.1G| 375T/ 634T( 59%)|        L3:  8.7T

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Gather logs by running the following command and provide the log set to Isilon Technical Support for analysis of the Panic:
# isi_gather_info -f /var/tmp/
 
Note: /var/tmp/ panic data is not collected in a default log gather, you must use isi_gather_info -f /var/tmp/ to collect proper panic information.

Once the logs are received, Technical Support is to review and analyze the Panic Stack details. They determine if the panic stack corresponds to any known issue or Knowledge Base article. In the event the Panic Stack details do not match a known issue or existing KB article, the issue is escalated for further assessment. Technical Support determines what actions are needed, such as a hardware replacement, code fix, firmware update, or other mitigation.

* If the Node is still down, additional troubleshooting must be performed to bring the node back online. Contact Isilon Technical Support if assistance is needed.

For more information, see article 55936: Isilon OneFS: Event notification: Node Offline - Event ID: 200010001, 300010003, 399990001, 900160001, 910100006, 400150007

Additional Information

Note:
  • This new Event Notification is enabled as part of the March 2021 Roll-Up Patches for OneFS v8.1.2.0, v8.2.2.0, and v9.1.0.5. Clusters running OneFS versions prior to the March 2021 Roll Up Patch (RUP) do not post this Event Notification.
  • OneFS 9.2, 9.3, 9.4 and beyond all have this feature included.
  • The update triggers an event if OneFS detects a reboot due to a node panic.
  • The event may include basic information such as core dump headers to help with understanding and troubleshooting the issue.
  • A resulting dial-home SR can include the additional information in a readable format for triage and analysis.

Note: This feature may cause false alarms due to prior panic events, cores, or minidumps that may be on the cluster prior to the installation of the March 2021 RUP. OneFS may continue to alert for old panic files until removed. Do the following steps to avoid false alarms.
  1. Clean up the messages log by force rotating the logs. This feature does not check for panic information in any rotated logs within a gz archive such as messages.0.gz, messages.1.gz, and so on. Run the following command to rotate /var/log/messages whenever any panics are found:
isi_for_array 'grep "panic @ time" /var/log/messages && newsyslog -F /var/log/messages'.
  1. Check the /var/crash directory of each node for any core or minidumps file. Either remove or back them up after getting approval.
  2. Run the following command to clean up vmcore files located in /var/crash on all nodes.
isi_for_array 'find /var/crash/ -name "vmcore*" -delete'

Affected Products

PowerScale OneFS

Products

Isilon
Article Properties
Article Number: 000184828
Article Type: Solution
Last Modified: 18 Sept 2025
Version:  15
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.