I just had an XIO1 array have both BBU batteries fail four hours apart, so the system decided to do a safe shutdown to protect me rofl. It's now stuck in a cluster state of attempting_to_start but both storage controllers are stuck in a degraded health state and ha_failure stop reason. I'm happy EMC is protecting my from myself, but I really don't want to wait for replacement BBU's to arrive as we now have a production environment down.
Are there XMS commands that can be run to force the SC's back online without a communicating BBU telling them it's happy?
There is no option to force the array to start service without enough BBU protection for the journals. The logic built into the software for the array is set to protect the data. The BBU's are incorporated into the array to provide sufficient time to destage journals to non-volatile storage in the event of Grid power loss. When the BBU's are failed or in disconnected state we cannot guaranty the ability of the storage controllers in the array to destage the journals in the event of grid power loss. As such the array performs Emergency Shutdown and destages the journals to protect the data.
No command was built in to allow to start the cluster when we cannot guaranty that we can destage journals from volatile storage in the event of grid power loss.
The following is the reasoning behind why the system was not designed with a method to force start when High Availability mechanisms are undermined.
In the event that an array was running without the BBU in healthy state and we lost power, we would lose the journals responsible for tracking metadata for the end data. This would render Data Loss.