PowerStore: Unexpected node reboot or kernel panic

摘要: In order to fully identify the cause of a reboot or provide a full Root Cause Analysis (RCA), various logs are needed.

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

The most likely event or error code for this issue is: 0x00304404
Description: Node has been physically removed or shut down.


Other possible event codes:
  • 0x00307701: XENV is not active.
  • 0x00304203: Node has stopped.
  • 0x00302b04: The Node has stopped.
  • 0x00300D06: The cluster service has stopped. 
  • 0x0030c601: The appliance has stopped servicing IOs.



  A node reboot can trigger other secondary alerts or dial homes, such as:
 

原因

A PowerStore node may reboot unexpectedly due to various reasons.
Each unexpected reboot should be investigated separately.
See the Additional Info section below for details on what is needed for this investigation.

解决方案

A few options exist to check for unexpected node reboots.
 

Checking alerts and events from the PowerStore Manager (GUI)

Check the events and alerts that could indicate an unexpected node reboot:
  • Within PowerStore Manager, check the Monitoring section, and look at the details under the ALERTS and EVENTS tabs.
  • Look for timestamps, error or event codes, messages, and so on. In order to make your searches clearer, use the filter options from within the ALERTS and EVENTS tabs:
SLN322081_en_US__5image(18730)
 
 

Checking for dump files

Check for the existence of system dump files around the time of the errors. Kernel dumps are not included in Data Collects.

Log in to the cluster over ssh and run svc_dc list_dumps
You can also try to find dump files from PowerStore Manager. For more details see PowerStore: How to generate and collect various logs from PowerStore.

To login to the nodes over ssh, find the cluster or node IP within PowerStore Manager under Settings > Network IPs. Log in with your preferred ssh client using the service user account and the respective service user password (defined during the setup of your system).  


 

Checking the uptime on both nodes

Run the command uptime on both nodes. This will tell you how long the node had been up for and help confirm possible reboots.
This is also useful as some unexpected reboots may not produce a dump file.


 

Other indicators

A gap in the Performance graphs in PowerStore Manager may also indicate a Node reboot. This should be used for guidance only, and you must confirm with more evidence as suggested above. Performance graphs are available either from Dashboard > PERFORMANCE, or Hardware > Appliance X > Performance.

其他信息

What is needed for Root Cause Analysis (RCA)?

  • Support Materials from all the appliances in the cluster. These should be gathered as close to the reboot as possible.
  • Dump file
See PowerStore: How to generate and collect various logs from PowerStore

受影响的产品

PowerStore
文章属性
文章编号: 000130141
文章类型: Solution
上次修改时间: 16 8月 2023
版本:  14
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。