There is no single mechanism be able to ensure the storage system's data integrity. Only by using a variety of mechanisms to establish successive defense lines, be able to against all error sources. And then we can ensure the recoverability of the data. Data Domain operating system (DD OS) is designed to ensure the data integrity. For this, there are two key areas:
·Continuous Fault Detection and Healing
·File System Recovery
This article will talk about the two key areas.
1.Continuous Fault Detection and Healing
Continuous fault detection and healing provide an extra level of protection within the Data Domain operating system. The DD OS detects faults and recovers from them continuously. Continuous fault detection and healing ensures successful data restore operations.
The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the foundation for Data Domain systems continuous fault detection and healing. Its dual-parity architecture offers advantages over conventional architectures, including RAID 1 (mirroring), RAID 3, RAID 4 or RAID 5 single-parity approaches.
·Protects against two disk failures.
·Protects against disk read errors during reconstruction.
·Protects against the operator pulling the wrong disk.
·Guarantees RAID stripe consistency even during power failure without reliance on NVRAM or an uninterruptable power supply (UPS).
·Verifies data integrity and stripe coherency after writes.
By comparison, after a single disk fails in other RAID architectures, any further simultaneous disk errors cause data loss. A system whose focus is data protection must include the extra level of protection that RAID 6 provides.
Continuous error detection works well for data being read, but it does not address issues with data that may be unread for weeks or months before being needed for a recovery. For this reason, Data Domain systems actively re-verify the integrity of all data every week in an ongoing background process. This scrub process finds and repairs defects on the disk before they can become a problem.
2.File System Recovery
The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is brought back online quickly.
This picture shows DIA file system recovery:
·Data is written in a self-describing format.
·The file system can be recreated by scanning the logs and rebuilding it from metadata stored with the data.
In a traditional file system, consistency is not checked. Data Domain systems check through initial verification after each backup to ensure consistency for all new writes. The usable size of a traditional file system is often limited by the time it takes to recover the file system in the event of some sort of corruption.
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the checking process can take so long is the file system needs to sort out the locations of the free blocks so new writes do not accidentally overwrite existing data. Typically, this entails checking all references to rebuild free block maps and reference counts. The more data in the system, the longer this takes.
In contrast, since the Data Domain file system never overwrites existing data and doesn’t have block maps and reference counts to rebuild, it has to verify only the location of the head of the log to safely bring the system back online and restore critical data.