According to your description, I would like to give some suggestions to you as follows:
First of all, you need to know exactly what time the node went offline or rebooted. You can check "node offline" event in "isi_alerts" in the log. We should always start with reviewing the messages log of the panic node to see what was going on on that node before it paniced.
Secondly, you may go to /var/log/messages log of the node and see what happened at that timeframe and before. Usually, RCA of node reboot can hardly be found by reviewing the logs. Many times, we need to escalate to engineering for minidump analysis which usually can reveal the exact reason of the reboot. However, not every node panic generates minidump. Meanwhile, not every node panic we can find root cause of it.
Seeing though that over 2 weeks have elapsed since you posted, maybe you have already found the answer (possibly opened up a ticket with support)? If so, once relocated, consider sharing the resolution and also marking the question as "Answered" so that it can help others that have the same question. If not, once relocated, it will then have more visibility by the community of dedicated customers, partners, and EMC employees that are eager to assist.
Vanitha13
14 Posts
0
February 20th, 2014 06:00
Hi,
Please let me know the version of Isilon one FS
Thanks
Vanitha
Paconet1
1 Rookie
•
68 Posts
0
February 20th, 2014 12:00
Hi
If you put the serial cable in the port console
Do you get some messages?
Because I have had that issue but depend of the screen is the solution
khkris
2 Intern
•
178 Posts
0
February 21st, 2014 08:00
Isilon One FS we are on is 7.0.3
I didn't verify with serial cable yet. Do you want me to test that way? So far we just received node offline and online event alerts
Paconet1
1 Rookie
•
68 Posts
0
February 21st, 2014 12:00
Yes you can get more information with the serial cable in the node, I adding an example
maybe you would need change of hardware depend of message
Jeffey1
4 Operator
•
2.8K Posts
0
February 26th, 2014 23:00
Hi Khkris,
According to your description, I would like to give some suggestions to you as follows:
First of all, you need to know exactly what time the node went offline or rebooted. You can check "node offline" event in "isi_alerts" in the log. We should always start with reviewing the messages log of the panic node to see what was going on on that node before it paniced.
Secondly, you may go to /var/log/messages log of the node and see what happened at that timeframe and before. Usually, RCA of node reboot can hardly be found by reviewing the logs. Many times, we need to escalate to engineering for minidump analysis which usually can reveal the exact reason of the reboot. However, not every node panic generates minidump. Meanwhile, not every node panic we can find root cause of it.
Jeffey1
4 Operator
•
2.8K Posts
0
March 3rd, 2014 00:00
Seeing though that over 2 weeks have elapsed since you posted, maybe you have already found the answer (possibly opened up a ticket with support)? If so, once relocated, consider sharing the resolution and also marking the question as "Answered" so that it can help others that have the same question. If not, once relocated, it will then have more visibility by the community of dedicated customers, partners, and EMC employees that are eager to assist.
khkris
2 Intern
•
178 Posts
0
March 11th, 2014 12:00
Thanks Jeffey. Yes, I opened SR with support, They found some errors from log messages on node external 10gig cable, and replaced Nic.
No alerts were generated. Does any snmp traps sent out for these kind of issues?