simultaneous storage node failure

Question

Hi All What is the max no of simultaneous storage node failure which will cause the avamar grid to shut down. TA

Avamar Exorcist · Accepted Answer

Q1) If a node fails and needs replacement, on the new node can the data not be built by putting in the disks from the faulted node? A1: If the physical node has failed then, in certain circumstances,  it is possible to transplant disks from the failed node chassis to a new node chassis in order to avoid having to perform a rebuild of the logical node using new hardware. There are various caveats associated with performing a 'transplant' of disks from a bad node to a new node and the procedure does vary from 'simple', to 'involved', depending on the hardware generation and type of the node.  We strongly recommend engaging the support team in the event of a node failure.  Support can help diagnose the root cause and determine the appropriate corrective action for it. Q2) If two nodes are completely damaged i.e. disk drives cannot be used from the faulted nodes, is it still possible to recover the grid using checkpoint or do we use the replicated data for recovery? Correct.  Checkpoints provide 'point in time' redundancy.  Hardware redundancy is provided by RAIN and replication.  If disks belonging to multiple nodes were damaged in such a way that their RAID groups' virtual disks failed, then, RAIN redundancy would have been lost and we would need to make use of a replication partner to recover the system. ..

Avamar Exorcist · Answer

When configured as a multi node RAIN system, Avamar can tolerate the loss of one node and continue to perform backups, restores and maintenance with no loss of data. Once the node which was unavailable is brought back online it will synchronise its parity and data stripes with those on the nodes which remained online.

If a second node were to become unavailable whilst one node were already down, the system would fail and it would need to be rolled back to an earlier point in time (i.e. an earlier checkpoint).

You can think of RAIN systems in a similar way to a regular RAID-5 disk array.

Note that a 1x2 system (2 data nodes) is a non-RAIN configuration. 1x2 non-RAIN systems cannot tolerate the loss of a single node. In this sense, a 1x2 system bears similarities with a RAID-0 disk.

1x2 systems are intended to be deployed with a replication partner to provide a further level of redundancy. The RAID-01 disk configuration (a mirror of stripes) may come to mind at this point..

singh_kuldeep · Answer

Hi Nicholas

Thank you for your response, so the recovery of a failed node is just like the rebuild of failed drive in RAID 5 disk array.

Need to understand the following

1) If a node fails and needs replacement, on the new node can the data not be built by putting in the disks from the faulted node?

2) If two nodes are completely damaged i.e. disk drives cannot be used from the faulted nodes, is it still possible to recover the grid using checkpoint or do we use the replicated data for recovery?

TA

Avamar

simultaneous storage node failure

Was this post helpful?