Data Domain: Upgrade Instructions for PCR/PCM enabled DD

요약: Physical Capacity Reporting (PCR) is a feature available since DD OS 5.7, but some versions have a defect, corrected in later releases, which may call for manual intervention from Support in some DD OS upgrade scenarios. ...

이 문서는 다음에 적용됩니다. 이 문서는 다음에 적용되지 않습니다. 이 문서는 특정 제품과 관련이 없습니다. 모든 제품 버전이 이 문서에 나와 있는 것은 아닙니다.

증상



With the DD OS 5.7 release, we have introduced Physical Capacity Reporting / Measurement (PCR/PCM). This is a way to estimate the amount of real disk space (post-comp) used by data in the FS, in a way which is reliable and reproducible, for charge-back and other business scenarios.

You can find more details about the feature in the FAQ (Frequently Asked Questions) KB article 465216 below:   
Data Domain Operating System (DDOS) physical capacity measurement/reporting (PCM/PCR) 

To generate PCR reports, the Data Domain file system (DDFS) Directory Manager (DM) creates PCR BTREEs on the system. 

DELL EMC Data Domain Engineering has identified a code defect where DM PCR Btree pages may become corrupt and DDFS may PANIC when a PCR job is started on the system. But even in the case for corrupted PCR Btree pages existing, PCR jobs may run to completion without any FS issues several times, but at other times, the FS may still crash.

This defect, which was present in all DD OS 5.7 - 6.2 releases, is fixed in some later DD OS versions (read below), so that running PCR will not risk running into FS PANICs as long as corrupted PCR Btree pages are not in the system yet.

As it is not possible to know up-front if there are any such corrupt pages (which could lead to FS PANICs), when upgrading to a fixed release it is mandatory to follow the steps described below, to make sure once on a fixed release (which will not create corrupt PCR btree pages) the FS will not PANIC on previously created corrupted PCR btree pages. Note the activity requires the DD FS to be down, and hence downtime needs to be planned in advance. It is not possible to give a good estimate at the downtime needed, but 2 hours should be enough for completing the task.

Note depending on the current and target DD OS releases and other factors, the task may either be done immediately before upgrading to any fixed DD OS release, or immediately after completing the upgrade to the new release, before any PCR jobs are scheduled to be run. The goal is, in any case, to have a DD OS with the fix running without any of the existing PCR btrees carried over from the earlier release, to avoid any chances of crashing on corrupted PCR btree pages.

원인

NA

해결

The defect causing corrupted PCR btree pages is addressed in the code in DD OS for the following releases. Note there is no fix for DD OS 5.7.x versions, as DD OS 5.7.x is already end of life:
  • DD OS 6.0.2.50 and later releases in the DD OS 6.0 family have the defect fixed.
  • DD OS 6.1.2.40 and later releases in the DD OS 6.1 family have the defect fixed (NOTE: DD OS 6.1.2.40 is NOT to be used with DD3300 due to an unrelated defect)
  • DD OS 6.2.0.20 and later releases in the DD OS 6.2 family have the defect fixed

If a user has used PCR in the past, and wants to upgrade to any of the fixed releases mentioned above, it is mandatory that some manual steps are carried out by Support (either immediately before or immediate after the upgrade) to make sure there are no corrupted PCR btree pages in the system after upgrading DD OS. Not following this recommendation would end up with the user DD running a fixed DD OS, which could still crash in the same way due to pre-existing PCR btree page corruption.

Users who have never used PCR, or who are upgrading to any DD OS release which does not have the code fix, do not need to go through the mentioned steps.

An important note is for DD HA clusters:
The steps described below to address and resolve the issue with potential corrupted PCR btrees are to be run from the DD HA active node only. Running the steps (script) from the standby node can result in issues, as low level DMCK operations may succeed on the FS just because the standby node does not have the FS enabled as such, while at the same time data is being accessed concurrently from the active node through the enabled FS process there.

If in doubt, or if you are planning for such an upgrade and need assistance with the process, contact your contracted support provider and reference this KB article.

해당 제품

Data Domain

제품

Data Domain
문서 속성
문서 번호: 000058816
문서 유형: Solution
마지막 수정 시간: 22 4월 2021
버전:  3
다른 Dell 사용자에게 질문에 대한 답변 찾기
지원 서비스
디바이스에 지원 서비스가 적용되는지 확인하십시오.