VxRail: What are the FTT and Erasure Coding
Summary: What are the Failures to Tolerate (FTT) and erasure coding?
Instructions
Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded, and encoded with redundant (TechTarget definition) data pieces and stored across different locations or storage media.
The goal of erasure coding is to enable data that becomes corrupted at some point in the disk storage process to be reconstructed by using information about the data that is stored elsewhere in the array (TechTarget definition). Erasure codes are often used instead of traditional RAID
(TechTarget definition) because of their ability to reduce the time and overhead required to reconstruct data. The drawback of erasure coding is that it can be more CPU-intensive, and that can translate into increased latency.
Number of Failures to Tolerate
This FTT option generally defines the number of host and device failures that a virtual machine object can tolerate. For n failures tolerated, n+1copies of the Virtual Machine object area created and 2n+1 hosts with storage are required. The default value is 1. The maximum value is 3.
vSAN supports two specific configurations when erasure codes are enabled. The first, RAID 5 applies when the number of failures to tolerate is set to 1. The second, RAID 6 applies when the number of failures to tolerate is set to 2. A vSAN cluster size must be at least four hosts for RAID 5 and at least six hosts for RAID 6.
Fault Tolerance Method
The fault Tolerance Method specifies whether the data-replication method optimizes for performance or capacity. The RAID 1 mirroring option for performance uses more disk space to place the object components but consumes less CPU and network resources. RAID-5/6 erasure coding is the capacity option. It uses less disk space, but consumes more CPU and network resources.
Managing Fault Domains in vSAN Clusters
If your vSAN cluster spans across multiple racks or blade server chassis in a data center and you want to ensure that your hosts are protected against rack or chassis failure, you can create fault domains and add one or more hosts to each fault domain.
A fault domain consists of one or more vSAN hosts grouped together according to their physical location in the data center. Fault domains enable vSAN to tolerate failures of entire physical racks and failures of a single host, capacity device, network link, or a network switch dedicated to a fault domain.
The Number of failures to tolerate policy for the cluster depends on the number of failures a virtual machine is provisioned to tolerate. For example, when a virtual machine is configured with the Number of failures to tolerate set to 1 (FTT = 1) and using multiple fault domains, vSAN can tolerate a single failure of any kind and of any component in a fault domain, including the failure of an entire rack.
When you configure fault domains on a rack and provision a new virtual machine, vSAN ensures that protection objects, such as replicas and witnesses, are placed in different fault domains. For example, if a virtual machine's storage policy has the Number of failures to tolerate set to N (FTT = n), vSAN requires a minimum of 2*n+1 fault domains in the cluster. When virtual machines are provisioned in a cluster with fault domains using this policy, the copies of the associated virtual machine objects are stored across separate racks.
A minimum of three fault domains are required. For best results, configure four or more fault domains in the cluster. A cluster with three fault domains has the same restrictions that a three-host cluster has, such as the inability to reprotect data after a failure and the inability to use the Full data migration mode. For information about designing and sizing fault domains, see Designing and Sizing vSAN Fault Domains.
Consider a scenario where you have a vSAN cluster with 16 hosts. The hosts are spread across four racks, that is, four hosts per rack. In order to tolerate an entire rack failure, you should create a fault domain for each rack. A cluster of such capacity can be configured to tolerate the Number of failures to tolerate set to 1. If you want to configure the cluster to allow for virtual machines with Number of failures to tolerate set to 2, you must configure five fault domains in a cluster.
When a rack fails, all resources including the CPU, memory in the rack becomes unavailable to the cluster. To reduce the impact of a potential rack failure, you should configure fault domains of smaller sizes. This increases the total amount of resource availability in the cluster after a rack failure.
When working with fault domains, follow these best practices:
- Configure a minimum of three fault domains in the vSAN cluster. For best results, configure four or more fault domains.
- A host not in any fault domain is considered to reside in its own single-host fault domain.
- You do not have to assign every vSAN host to a fault domain. If you decide to use fault domains to protect the vSAN environment, consider creating equal sized fault domains.
- When moved to another cluster, vSAN hosts retain their fault domain assignments.
- When designing a fault domain, it is recommended that you configure fault domains with a uniform number of hosts.
- For guidelines about designing fault domains, see Designing and Sizing vSAN Fault Domains
.
- You can add any number of hosts to a fault domain. Each fault domain must contain at least one host.