So this is kind of driving me crazy. Whenever something triggers a critical alert, my front panels go amber, maybe on the particular node or for all of them. I clear whatever was causing the issue (yesterday it was planned firmware and patch updates to all nodes) but the front panels never clear the amber light. My WebGUI is back to green, but my front panels are amber. The biggest problem is this is in a datacenter on the other side of my campus, and the night crew does a walk-through and will often alert me that "something's wrong with the Isilon". We're on x410 nodes. I clear the amber light by reseating the front panel.
I am using the WebGUI to clear alerts, and haven't tried the CLI yet. We're on OneFS 188.8.131.52 and I'm still finding quirks between WebGUI and CLI so I suppose that could be contributing to it. Does anyone else experience this? Is there any way for me to clear the front panel (or even know it is amber) if my WebGUI is green and isi status is all OK?
Yes, I did see it too, exactly the same story. Still the same after going to OneFS 184.108.40.206.
My suspicion is that it's related to the Baseboard Management Controller (BMC) firmware.
Anyway, did you manage to find a resolution? Otherwise I'd contact support for this. But last time I did they wanted to update the mentioned firmware with what seemed like a somewhat convoluted procedure and I opted to install a work-around patch. It should be easier since 220.127.116.11 I believe and the firmware should now upgradable with normal upgrade procedures. Not sure if the firmware package is already on the main site for this. But I noticed your message and I though I'd check how it went for you.
Thanks Deeb, appreciate the feedback and knowing it's happening to someone else. So are you saying that you received a work-around by opening a call? If so I'll try that route. I was waiting for it happen again so I could have it occurring when I made the call.
The firmware I mentioned WAS the BMC firmware, and yes, starting with 18.104.22.168 you can now update it in a similar fashion as you can with other firmware, but it was not part of the main firmware package as of a few weeks ago. Also, the field techs are more comfortable performing the USB method of updating it at this time it seems. Since we had a tech scheduled to come on-site, I just let him do it.
It is possible the BMC firmware did the trick but I still suspect something may be still a little off, as over the weekend I had a warning alert "BMC Watchdog: BMC firmware stack or CPU have been reset" generated by one of the nodes. However, it didn't appear to knock the panel amber (I didn't get an alert from the night crew anyway). EMC reached out proactively but didn't say what might have happened, only that the BMC firmware was up to date and the cluster is healthy and responsive.
Thanks for the info!
we are having the same issue with OneFS 22.214.171.124 and X210 nodes. Have you tried "isi_lcd_d start" through CLI? This command restarts the LCD service daemon.
I did try to reset the LCD service daemon on one occasion in the past but it didn't seem to have any result. So far it only happens right after a node reboot and no other other circumstance but an actual issue (like a shutdown). But after a reboot OneFS reports health with no open events. It's like the booting process does not clear the old status (boot time) on the display. The lcd_d does not seem to be able to clear this, only when the display itself is powered off and on again.
Next time it happens I'll let EMC support work on it a bit more, possibly they'll want to get BMC firmware updated manually although I had thought 126.96.36.199 and latest fw update packages would have done that by now.
Yesterday I did a patch install that required a rolling reboot. Issue happened again, the GUI/isi status all back to green, the front panels all amber. I attempted "isi_lcd_d restart" on each node with no change in the panel lights. I waited until this morning to re-seat the panels but that was the only way to get them back to blue.
We are at the latest BMC firmware (1.25.9722) running v188.8.131.52. All other firmware is up to date as well.
Ryan, how about your CMC firmware? These are the minimal versions:
S210 / X410 01.02
This was in the initial instructions for the BMC fw upgrade so I
wondered if it's linked. although it should have been done before
updating BMC firmware to I expect it to be fine. But just in case.
Did you use the FTP package for the BMC update? It's not part of the
normal updates yet ASAIK.
Yes, our x410 nodes were already at CMC 01.02 before the BMC firmware was updated. The BMC firmware update itself was performed on-site by an EMC tech using the USB flash method on each node in-turn (I assume the .tar you show but I believe he brought it with him). After he left, I performed the FW package 9.3.4 upgrade but the only thing at that point that was out of date was the 10GbE.
A couple weeks ago EMC checked and confirmed that everything is as up to date as is available when it comes to firmware. For now I'm just making sure to take a walk to the other side of campus and re-seat the panels after I do a node reboot and being thankful our datacenter isn't in another city, state or country!
This is sort of related but in a different color. Not sure how to clear the amber LEDs but does anyone know why the main BLUE LEDs slowly flash on some nodes and not on all the nodes in the same cluster? Had our X210 nodes for a bout 7 months and never really noticed why some are flashing and others are not. Web interface shows no alarms or errors. On 184.108.40.206. I think on latest BMC code 1.25.9722.
Can't seem to find any documentation. Searched for "x210 blue leds" and got this post.
Any help will be appreciated. Thanks
Just to add a comment. I upgraded from 220.127.116.11 to 18.104.22.168 last week. After the upgrade, several of my NL410 nodes had their LCD panels turned off. I had reset the CMC and BMC on those nodes via CLI and the LCDs turned on. The upgrade included Node firmware 9.3.4. However, we are not having the amber light issue.
# /usr/bin/isi_hwtools/isi_ipmicmc -c -a cmc