Data Domain: Upgrade Instructions for PCR/PCM enabled DD

摘要: Physical Capacity Reporting (PCR) is a feature available since DD OS 5.7, but some versions have a defect, corrected in later releases, which may call for manual intervention from Support in some DD OS upgrade scenarios. ...

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状



With the DD OS 5.7 release, we have introduced Physical Capacity Reporting / Measurement (PCR/PCM). This is a way to estimate the amount of real disk space (post-comp) used by data in the FS, in a way which is reliable and reproducible, for charge-back and other business scenarios.

You can find more details about the feature in the FAQ (Frequently Asked Questions) KB article 465216 below:   
Data Domain Operating System (DDOS) physical capacity measurement/reporting (PCM/PCR) 

To generate PCR reports, the Data Domain file system (DDFS) Directory Manager (DM) creates PCR BTREEs on the system. 

DELL EMC Data Domain Engineering has identified a code defect where DM PCR Btree pages may become corrupt and DDFS may PANIC when a PCR job is started on the system. But even in the case for corrupted PCR Btree pages existing, PCR jobs may run to completion without any FS issues several times, but at other times, the FS may still crash.

This defect, which was present in all DD OS 5.7 - 6.2 releases, is fixed in some later DD OS versions (read below), so that running PCR will not risk running into FS PANICs as long as corrupted PCR Btree pages are not in the system yet.

As it is not possible to know up-front if there are any such corrupt pages (which could lead to FS PANICs), when upgrading to a fixed release it is mandatory to follow the steps described below, to make sure once on a fixed release (which will not create corrupt PCR btree pages) the FS will not PANIC on previously created corrupted PCR btree pages. Note the activity requires the DD FS to be down, and hence downtime needs to be planned in advance. It is not possible to give a good estimate at the downtime needed, but 2 hours should be enough for completing the task.

Note depending on the current and target DD OS releases and other factors, the task may either be done immediately before upgrading to any fixed DD OS release, or immediately after completing the upgrade to the new release, before any PCR jobs are scheduled to be run. The goal is, in any case, to have a DD OS with the fix running without any of the existing PCR btrees carried over from the earlier release, to avoid any chances of crashing on corrupted PCR btree pages.

原因

NA

解决方案

The defect causing corrupted PCR btree pages is addressed in the code in DD OS for the following releases. Note there is no fix for DD OS 5.7.x versions, as DD OS 5.7.x is already end of life:
  • DD OS 6.0.2.50 and later releases in the DD OS 6.0 family have the defect fixed.
  • DD OS 6.1.2.40 and later releases in the DD OS 6.1 family have the defect fixed (NOTE: DD OS 6.1.2.40 is NOT to be used with DD3300 due to an unrelated defect)
  • DD OS 6.2.0.20 and later releases in the DD OS 6.2 family have the defect fixed

If a user has used PCR in the past, and wants to upgrade to any of the fixed releases mentioned above, it is mandatory that some manual steps are carried out by Support (either immediately before or immediate after the upgrade) to make sure there are no corrupted PCR btree pages in the system after upgrading DD OS. Not following this recommendation would end up with the user DD running a fixed DD OS, which could still crash in the same way due to pre-existing PCR btree page corruption.

Users who have never used PCR, or who are upgrading to any DD OS release which does not have the code fix, do not need to go through the mentioned steps.

An important note is for DD HA clusters:
The steps described below to address and resolve the issue with potential corrupted PCR btrees are to be run from the DD HA active node only. Running the steps (script) from the standby node can result in issues, as low level DMCK operations may succeed on the FS just because the standby node does not have the FS enabled as such, while at the same time data is being accessed concurrently from the active node through the enabled FS process there.

If in doubt, or if you are planning for such an upgrade and need assistance with the process, contact your contracted support provider and reference this KB article.

受影响的产品

Data Domain

产品

Data Domain
文章属性
文章编号: 000058816
文章类型: Solution
上次修改时间: 22 4月 2021
版本:  3
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。