Yesterday our single Avamar node crashed with the following error: “HARDWARE: Aug 18 05:41:20 CPSAV001 hwfaultd: Expander Unrecoverable Error" on our Gen 4 appliance. The system did a dial home via ESRS however when Support tried to connect it was not accessible. They suggested we dispatch a CE to power drain the system. This morning while I was checking the system before the CE arrived, I popped a console cable into it and noticed that the system was trying to do a memory dump. As soon as I plugged the console cable in it started writing. I started logging Putty so that I had a record of what was happening. As soon as it was done it rebooted and came up however there was corruption. I reached back out to Support and shortly afterwards the system was back up and running after recovering the last validated checkpoint.
Three hours later, I log into the system to look for a deleted file and I noticed that MCS wouldn't connect. It just spun. I tried SSH next, it connects but nothing happened. I plugged a console cable back in and saw the following:
That was 90 minutes ago and its still going and I still cannot get in. I am waiting for the Engineer who assisted earlier to become available for a Webex, but in the mean time, anyone have any ideas as to what is happening?