ECS: Storage Efficiency
Summary: This article explains the storage efficiency for ECS.
Instructions
ECS uses erasure coding for data protection.
Although more storage efficient than other forms of protection, such as mirroring, it does incur some storage overhead.
ECS provides a mechanism in which storage efficiency increases as three or more sites are used.
In a geo-replicated setup with multiple sites/ VDCs, ECS replicates chunks from the primary VDC to a remote site in order to provide high availability.
However, this simple replication can lead to a large overhead of disk space.
To alleviate this, ECS uses an innovative technique to reduce overhead while preserving high availability features.
This can be illustrated with a simple example.
Consider 3 VDCs in a multi-site environment - VDC1, VDC2 and VDC3, and that VDC1 has chunk C1 and VDC2 has chunk C2.
With simple replication, a secondary copy of C1 and a secondary copy of C2 may be placed in VDC3. Since all chunks are of the same size, this results in a total of 4 x 128 MB of space being used to store 2 x 128 MB of objects.
In this situation ECS can perform an XOR operation of C1 and C2 (mathematically, written as C1 C2) and place it in VDC3 and get rid of individual secondary copies of C1 and C2.
Rather than using 2 x 128 MB of space in VDC3, ECS now uses only 128 MB (the XOR operation results in a new chunk of the same size).
In this case, if VDC1 goes down, ECS can reconstruct C1 by using C2 from VDC2 and the (C1 C2) data from VDC3. Similarly, if VDC2 goes down, ECS can reconstruct C2 by using C1 from VDC1 and the (C1 C2) data from VDC3.
As the number of linked sites increases, the ECS algorithm is more efficient in reducing the overhead.
Table 10 provides information about the storage overhead based on the number of sites for normal erasure coding of 12+4 and cold archive erasure coding of 10+2, and it illustrates how ECS becomes more storage efficient as more sites are linked. To obtain the lower overhead, the same amount of data must be written at each site.
In some scenarios, replication may be wanted on all sites for increased data protection and enhanced read performance. Enabling this feature would disable the XOR capability for storage efficiency described. Replication in all sites is available in ECS 2.2 and later.