Data Domain: Versions 7.7.1.0, 7.7.2.0, 7.8.0.0, and 7.8.0.10. Potential data loss when backups are written while cleaning cycle is running

Summary: Data Domain systems (DDRs, DDVE, DDVE in cloud) running DDOS version 7.7.1.0, 7.7.2.0, 7.8.0.0, and 7.8.0.10 may experience a potential data loss issue when backups are written while Cleaning is running due to a rare race condition. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms


Data may be incorrectly deleted from the system during the cleaning cycle. The backup can no longer be fully read back. When this issue occurs the Data Domain Restorer (DDR) may exhibit one or more of the following:
  • Alert posted indicating that corruption has been found on disk:  
Id      Post Time                  Severity   Class        Object        Message
-----   ------------------------   --------   ----------   -----------   ------------------------------------------------------------------
m0-32   Wed Jun 29 05:19:16 2022   CRITICAL   Filesystem   Tier=Active   EVT-FILESYS-00020: Corruption has been detected in the filesystem.
  • Experience unplanned restarts of the Data Domain File System (DDFS) if the impacted file is read by restore, replication, data-movement to cloud, or cleaning cycle.

SCOPE OF IMPACT
  • Potentially Affected Systems: DDR, DDVE, DDVE-in-Cloud systems running DDOS Versions 7.7.1.0, 7.7.2.0, 7.8.0.0, 7.8.0.10. Systems running any other version of DDOS are not impacted.
  • Existing backups that are written in any prior release of DDOS are not impacted.
  • Backups Tiered to the cloud using data-movement are not impacted.
  • If File Verification has verified backups beyond the last Cleaning completion time and no "Corruption Detected" alert is posted, this validates no backups are affected.
  • If all backups have been successfully replicated, this implies that the Replication Source Data Domain is not impacted.

Cause

The issue occurs due to a rare race condition.
 

Resolution

This issue is fixed in DDOS Releases 7.7.1.10, 7.7.2.10, 7.8.0.20 and later.  
Choose from the current available versions on the download portal to incorporate additional fixes and latest security vulnerabilities.

Follow both sections below:
(I) How to prevent the possibility of new backups being impacted
(II) Data Integrity Verification and Remediation.   

 
(I) How to prevent the possibility of new backups being impacted 

The recommended action is to install a DDOS Release containing the fix for this issue. If DDOS release with fix is installed, no further action in this section (I) is needed.

OR
If DDOS Release with fix cannot be immediately installed, proceed with (I):

 
1. Disable Cleaning schedule.
 
# filesys clean set schedule never
Filesystem cleaning is scheduled to run "never".
 
Stop current Clean cycle if running
 
# filesys clean status
Cleaning started at 2022/06/27 12:32:03: phase 4 of 6 (pre-select)
  8.7% complete,   438 GiB free; time: phase  0:00:01, total  0:10:35
 
# filesys clean stop
 
The 'filesys clean stop' command stops the filesystem cleaning.
        Are you sure? (yes|no) [no]: yes
 
ok, proceeding.
 
2. If Cleaning must be run to alleviate capacity issue, disable all ingest before starting Cleaning.
 
2a. Disable Backups/Cloning from Backup Application.
 
2b. Disable Protocols on Data Domain.
# replication disable all
# nfs disable
# cifs disable
# ddboost disable
# vtl disable
 
2c. Run Cleaning and monitor as follows:
 
# filesys clean start
Active tier cleaning started.  Use 'filesys clean watch' to monitor progress.
 
# filesys clean watch
Beginning 'filesys clean' monitoring.  Use Control-C to stop monitoring.
 
Cleaning: phase 1 of 6 (pre-merge)
  100.0% complete,   438 GiB free; time: phase  0:00:42, total  0:00:42-
 
Cleaning: phase 2 of 6 (pre-analysis)
    3.3% complete,   438 GiB free; time: phase  0:00:16, total  0:00:59
 
2d. Once Cleaning starts phase 2 (preanalysis), reenable Backup scheduling and Protocols.
# replication enable all
# nfs enable
# cifs enable
# ddboost enable
# vtl enable
NOTE: Once the DDOS release with fix is installed, be sure to reset the Cleaning schedule.

Example:
# filesys clean set schedule Tue,0600

(II) Data Integrity Verification and Remediation

Data Domain Architecture ensures all backups that are written are verified for integrity.  If any backup fails verification, an alert is posted.  The Last Verified file timestamp indicates the date/time up to which all files have been verified.  If the "Last Verified Timestamp" is later than the last Cleaning cycle AND there is no "Corruption Detected" Alert, this validates no backups are impacted.
 
1. Check if there is a Data Integrity Alert Posted:
The alert looks like the following:
CRITICAL Filesystem Tier=Active EVT-FILESYS-00020: Corruption has been detected in the filesystem.
Check for current alerts with the command below:
# alerts show current

If a Data Integrity Alert is seen, contact Dell Technologies Customer Support immediately.
 
2. If no Data Integrity Alert is posted, check Last Cleaning completion time and File Verification Timestamp.
 
2a. Check Last Cleaning Run Time:
# filesys clean status
Cleaning finished at 2022/06/24 16:36:01.
 
2b. Check File Verification (FV) Timestamp:
Contact Dell Technologies Customer Support to determine the FV timestamp.
 
  • If File Verification (FV) Timestamp is later than the date and time of the last Clean cycle, this validates that no backups are impacted.  
  • If FV Timestamp is behind the date/time last Clean cycle by a week or less, please wait for it to catch up and check again.
NOTE: FV Timestamp being behind does not indicate a Data Integrity issue. It just means that files written later than the FV Timestamp date have yet to be verified).
  • For any system running the affected DDOS version (also any system that is recently upgraded to the fixed release), if FV Timestamp is behind by more than a week (or cannot be determined), contact Dell Technologies Customer support to assist with data validation.
Article Properties
Article Number: 000200905
Article Type: Solution
Last Modified: 11 Dec 2023
Version:  10
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.