Data Domain: Upgrade Instructions for PCR/PCM enabled DD

Summary: Physical Capacity Reporting (PCR) is a feature available since DD OS 5.7, but some versions have a defect, corrected in later releases, which may call for manual intervention from Support in some DD OS upgrade scenarios. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



With the DD OS 5.7 release, we have introduced Physical Capacity Reporting / Measurement (PCR/PCM). This is a way to estimate the amount of real disk space (post-comp) used by data in the FS, in a way which is reliable and reproducible, for charge-back and other business scenarios.

You can find more details about the feature in the FAQ (Frequently Asked Questions) KB article 465216 below:   
Data Domain Operating System (DDOS) physical capacity measurement/reporting (PCM/PCR) 

To generate PCR reports, the Data Domain file system (DDFS) Directory Manager (DM) creates PCR BTREEs on the system. 

DELL EMC Data Domain Engineering has identified a code defect where DM PCR Btree pages may become corrupt and DDFS may PANIC when a PCR job is started on the system. But even in the case for corrupted PCR Btree pages existing, PCR jobs may run to completion without any FS issues several times, but at other times, the FS may still crash.

This defect, which was present in all DD OS 5.7 - 6.2 releases, is fixed in some later DD OS versions (read below), so that running PCR will not risk running into FS PANICs as long as corrupted PCR Btree pages are not in the system yet.

As it is not possible to know up-front if there are any such corrupt pages (which could lead to FS PANICs), when upgrading to a fixed release it is mandatory to follow the steps described below, to make sure once on a fixed release (which will not create corrupt PCR btree pages) the FS will not PANIC on previously created corrupted PCR btree pages. Note the activity requires the DD FS to be down, and hence downtime needs to be planned in advance. It is not possible to give a good estimate at the downtime needed, but 2 hours should be enough for completing the task.

Note depending on the current and target DD OS releases and other factors, the task may either be done immediately before upgrading to any fixed DD OS release, or immediately after completing the upgrade to the new release, before any PCR jobs are scheduled to be run. The goal is, in any case, to have a DD OS with the fix running without any of the existing PCR btrees carried over from the earlier release, to avoid any chances of crashing on corrupted PCR btree pages.

Cause

NA

Resolution

The defect causing corrupted PCR btree pages is addressed in the code in DD OS for the following releases. Note there is no fix for DD OS 5.7.x versions, as DD OS 5.7.x is already end of life:
  • DD OS 6.0.2.50 and later releases in the DD OS 6.0 family have the defect fixed.
  • DD OS 6.1.2.40 and later releases in the DD OS 6.1 family have the defect fixed (NOTE: DD OS 6.1.2.40 is NOT to be used with DD3300 due to an unrelated defect)
  • DD OS 6.2.0.20 and later releases in the DD OS 6.2 family have the defect fixed

If a user has used PCR in the past, and wants to upgrade to any of the fixed releases mentioned above, it is mandatory that some manual steps are carried out by Support (either immediately before or immediate after the upgrade) to make sure there are no corrupted PCR btree pages in the system after upgrading DD OS. Not following this recommendation would end up with the user DD running a fixed DD OS, which could still crash in the same way due to pre-existing PCR btree page corruption.

Users who have never used PCR, or who are upgrading to any DD OS release which does not have the code fix, do not need to go through the mentioned steps.

An important note is for DD HA clusters:
The steps described below to address and resolve the issue with potential corrupted PCR btrees are to be run from the DD HA active node only. Running the steps (script) from the standby node can result in issues, as low level DMCK operations may succeed on the FS just because the standby node does not have the FS enabled as such, while at the same time data is being accessed concurrently from the active node through the enabled FS process there.

If in doubt, or if you are planning for such an upgrade and need assistance with the process, contact your contracted support provider and reference this KB article.

Affected Products

Data Domain

Products

Data Domain
Article Properties
Article Number: 000058816
Article Type: Solution
Last Modified: 22 Apr 2021
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.