Node Rebooted automatically

Question

One of our Isilon node went offline for 15 minutes and came up automatically.  I don't find any information from /var/log/messages. Could some one suggest, what log files can be verify to find out the root cause.

Vanitha13 · Answer

Hi, Please let me know the version of Isilon one FS Thanks Vanitha

Paconet1 · Answer

Hi If you put the serial cable in the port console Do you get some messages? Because I  have had that issue but depend of the screen is the solution

khkris · Answer

Isilon One FS we are on is 7.0.3 I didn't verify with serial cable yet. Do you want me to test that way?  So far we just received  node offline and online event alerts

Paconet1 · Answer

Yes you can get more information with the serial cable in the node, I adding an examplemaybe you would need change of hardware depend of message

Jeffey1 · Answer

Hi Khkris,

According to your description, I would like to give some suggestions to you as follows:

First of all, you need to know exactly what time the node went offline or rebooted. You can check "node offline" event in "isi_alerts" in the log. We should always start with reviewing the messages log of the panic node to see what was going on on that node before it paniced.

Secondly, you may go to /var/log/messages log of the node and see what happened at that timeframe and before. Usually, RCA of node reboot can hardly be found by reviewing the logs. Many times, we need to escalate to engineering for minidump analysis which usually can reveal the exact reason of the reboot. However, not every node panic generates minidump. Meanwhile, not every node panic we can find root cause of it.

Jeffey1 · Answer

Seeing though that over 2 weeks have elapsed since you posted, maybe you have already found the answer (possibly opened up a ticket with support)?  If so, once relocated, consider sharing the resolution and also marking the question as 'Answered' so that it can help others that have the same question.  If not, once relocated, it will then have more visibility by the community of dedicated customers, partners, and EMC employees that are eager to assist.

khkris · Answer

Thanks Jeffey. Yes, I opened SR with support, They found some errors from log messages on node external 10gig cable, and replaced Nic.

No alerts were generated. Does any snmp traps sent out for these kind of issues?

Isilon

Was this post helpful?