Start a Conversation

Unsolved

This post is more than 5 years old

7998

November 24th, 2017 11:00

VxRAIL and Power Outage

Hello VxRAIL Experts:

Two weeks ago, the DataCenter of one us customers suffered a Power Outage. They have a VxRAIL with two appliances and four Quanta Nodes. Of course, all the VxRAIL solution (nodes, appliances and switches) were abruptly power off due the Power outage.

When the electric power was restored, the Nodes booted up well, but once in vmware, the VSAN, was completely undone and after a SR opened and support assistance by one week, the only option at the end was re-install the VxRAIL from zero.

Please, could you say me if is there any procedure or document about how to handle power outages with VxRAIL ? ... Or definitively is could happen again.

Thank you.

4 Posts

November 27th, 2017 04:00

This is a really scary scenario. Were you able to recover the data on the VSAN?

Rebuilding the cluster seems to be the default for any issues experienced on the VXRails.

We bought an 2 appliance, 3 node Quanta system which literally took 10 months to get physically working. This included replacing the entire hardware bill-of-materials and re-imaging the hardware almost 4 times, then rebuilding from scratch.

Eventually the configuration got to a point where the setup was complete and everything showed healthy. When we tried to integrate Active Directory, we were told the entire system had to be reset from scratch as AD needed to be set up from the initialization page and could not be changed later on. This was done, bringing the entire build up to 12 months now.

We have not migrated any systems to the VXRail yet, but once we do, the scenario above would not be acceptable and is a huge concern.

I would really appreciate letting me know what was recovered and how this was done. If you do not want to post here, please let me know and I will give you an email address to send a private message.

Thank you

20 Posts

November 27th, 2017 06:00

Hi Kheldar9:

Unfortunately, Nothing could be recovered. Fortunately, the VxRAIL in our customer is the Recovery Site of a VMWARE Production Site with RP4VM. Therefore, we lost all the replication done (aprox 22 TB of 35 VMs). These days, we are in the process of replication again in order to have everthing like before of the power outage. But the bad thing is our customer is very concern about the "Stability" of the VxRAIL an so we.

1 Message

November 28th, 2017 09:00

Wow, this could really be a huge problem...

We just had our VxRAILs delivered and not installed yet, I'm already worried.

8 Posts

November 28th, 2017 17:00

@OctavioGM - I can by planned experience about loss of power, recovery for HCI is going to be painful and not surprised if fatal.

Prior to my VxRail deployment I lead a Proof of Concept and bake off between Nutanix and VxRail - full installations with test VMs. One of my critical POC tests was full loss of power; simply pulling the power cords out of the Nutanix and VxRail appliances.

Both brands suffered tremendously! Nutanix did not recover at all, total loss of use. VxRail or vSAN eventually recovered but with damage - hours to become functional again without intervention. My POC conclustion to management was "Plan for the worst, hope for the best". These data center worthy systems need more catastrophy proof engineering

Converged infrastructure technology has many years to go before fully matured and resilient to the unexpected like uncontrolled loss of power.

You are not alone my friend, plan well for the unexpected

November 29th, 2017 08:00

As an IT professional who is in the process of migrating into a 7 node VXRail E460F cluster as my primary data center, this is extremely concerning.

https://www.vskilled.com/2017/01/vsan-all-hosts-down-scenario/

Would those troubleshooting steps apply 100% to a VXRail cluster all hosts down scenario?

Kind Regards

8 Posts

November 29th, 2017 09:00

Power conditioning and redundancy is very important for VxRail, only a few of the many risks that should be assessed.

I don't know how many appliances you are deploying, one is too risky. I have 2, one is production the other is backup or fail-over. I have RecoverPoint 4 VMs deployed, it is helpful but not the end-game. Better option is to learn more about vSAN Stretch Clusters

Introduction to Stretched Clusters

Like many going with HCI, piling on dozens of VMs into one appliance without considering the risk of a complete failure without a live secondary solution is a huge mistake - data centers do fail, planned and unplanned

If you deploy at least 3 VxRails as 1 cluster some of the anxiety is relieved. Having all Rails in the same physical space calls for business continuity review

20 Posts

November 30th, 2017 08:00

Thank you Keith for your comments. I must to say that a second VxRAIL  like recover option should be considered. This week I have a meeting wit our Presales personnel to evaluate the best recovery options and avoid have only one solution in the customer for Production environments. Thank you everyone for all your comments and answers.

No Events found!

Top