partitions checkpoints are invalid?

Question

I got this from EMC SP collect report:

Partition Checkpoints are invalid. Rebooting or trespassing of LUNS using this drive will cause DU. It says it is documented in Primus emc106749 but I could not find the doc on the site.

This is on a CX-500. Any ideas what that means? Flare upgrade was done recently and the health check on the clariion is OK.

Thanks

DanT2 · Answer

Thank you Sandip. Support is looking at it now.

nandas · Answer

I am unable to attach any file. Thus trying to paste the primus emc156943 here. The other two Primus solutions are for EMC internal use only - that may be the reason you could not find it.

As I mentioned earlier - please engage EMC Support team by opening a Service Request if not already done.

Others may provide some more suggestions or valuable inputs here.

Thanks,
Sandip

____________________________________________________________________

Diagnosing invalid checkpoints in FLARE Releases 19 to 26."

ID: emc156943
Usage: 112
Date Created: 03/14/2007
Last Modified: 04/03/2008

Knowledgebase Solution

Question: Diagnosing invalid checkpoints in FLARE Releases 19 to 26.
Environment: Product: CLARiiON CX-series
Environment: Product: CLARiiON CX3-series
Environment: Product: CLARiiON DAE2P
Environment: Product: CLARiiON DAE3P
Environment: EMC Firmware: FLARE Release 19
Environment: EMC Firmware: FLARE Release 22
Environment: EMC Firmware: FLARE Release 24
Environment: EMC Firmware: FLARE Release 26
Problem: CAP2 identifies the following issue: Partition Checkpoint is invalid (or Partition Checkpoints are invalid). Rebooting or trespassing of LUNs using this drive will cause DU.
Problem: LUNs may become unowned after a trespass or SP reboot
Root Cause: The FLARE issue that the CAP tool has detected can be caused by different events. Some of these events have already been identified, Others are still under investigation by CLARiiON Engineering.
In Release 19 and 22, LUNs with invalid checkpoints will become unowned after a trespass or SP reboot. In Release 24 and 26 LUNs with invalid checkpoint may not become unowned as a result of this condition.

This message can also be reported erroneously when using older revisions of CAP or TRiiAGE or both.

Fix: In order to properly identify the cause and the recovery, invalid checkpoint issues that are reported in Release 19 and later must be escalated to a CTS within CLARiiON Level 2 Support or EMC Sustaining Engineering.
Conditions resulting in invalid checkpoints are fixed in Release 24 Patch 014. A workaround for this is to leave at least six seconds between drive replacements.

Use Release 26 CAP 6.26.50.1.25 or TRiiAGE 23.2 ensures that the invalid checkpoint message is not reported erroneously.

Notes: Also see EMC Knowledgebase solution emc106952 ("Using CAP tool to proactively identify LUNs in RAID group that could become unowned after a trespass or SP reboot by identifying the affected disk") for a known issue which results in invalid checkpoints in releases prior to Release 14.010 and Release 13.017.
Notes: Also see EMC Knowledgebase solution emc156894 ("Recognizing an invalid checkpoint issue on a DAE2P UltraPoint chassis [Stiletto] 2Gb/s enclosure") for one known issue which results in invalid checkpoints in Release 19, Release 22 and Release 24.

nandas · Answer

My pleasure. If any of my post was of any help, request you to kindly mark them "Correct" or "Helpful" which will help any future reference or query.

Also, request you to keep us posted, if possible, with the outcome of Support's analysis and resolution to your issue. I guess, you may need a code upgrade.

Regards,
Sandip

nandas · Answer

Was there any unbound disk replaced recently?

However, your case is well discussed in two Primus solutions - emc156943 and emc156894.

I am trying to attach these two primus and also primus emc106749 to this thread for your easy reference.

I'll suggest to engage EMC CLARiiON Support to further investigate and suggest accordingly. The case may need to be escalated to Level 2 support or CLARiiON engineering.

Thanks,
Sandip

DanT2 · Answer

Will do.

infernet · Answer

Hey Brother,Please look primus.ID: emc106749Usage: 20Date Created: 04/19/2005Last Modified: 05/21/2007STATUS: ApprovedAudience: Support Knowledgebase Solution Question: Using CAP tool to proactively identify LUNs in RAID group that could become unowned after a trespass or SP reboot by identifying multiple disks affected in the same RAID groupEnvironment: Product: CLARiiON CX-SeriesEnvironment: EMC Firmware: FLARE releases prior to Release 13 Patch '017' and Release 14 Patch '010'Environment: EMC SW: CLARiiON Array Properties (CAP) utilityEnvironment: CAP tool reports a critical severity error and displays multiple FRU numbers as the failing components with the following text: ¿Multiple drives contain invalid checkpoints. Rebooting or trespassing of LUNS in this RAID group will cause DU.¿ Problem: CAP tool has detected an array is vulnerable to the issue described in solution emc93299 ('Some or all LUNs in RAID group become unowned after a trespass or SP reboot'). Root Cause: FLARE issue that the CAP tool has detected, that is, a hot spare of different capacity was once swapped in for the drives identified by CAP. A trespass of the LUNs in the RAID group containing those drives could lead to them becoming unowned. This issue occurs under the following conditions: * A hot spare of a different size is swapped in place of a failed or removed drive in that RAID group. The original failed or removed drive has been replaced and the hot spare has equalized back to the new replacement; OR * A hot spare of the same size is swapped in place of a failed or removed drive in that RAID group. The original failed or removed drive has been replaced with a drive of a different size and the hot spare has equalized back to the new replacement; AND * Multiple drives in the same RAID group have been equalized in one of the two manners listed above. Any LUNs in that RAID group experience a trespass. Note: Arrays running FLARE Release 11, Release 12, and all patch releases prior to Release 13 patch .017 and Release 14 patch .010 can encounter this problem. If this issue is observed in a later release, escalate to a CTS within CLARiiON Level 2 Support or Sustaining Engineering. Warning! Do not trespass any LUNs in the affected RAID group or reboot the array until the recovery procedure has been completed.Fix: Warning! If LUNs have become unowned, escalate to a CTS within CLARiiON Level 2 Support or EMC Sustaining Engineering as described in solution emc93299. Apply the following fix only if LUNs have NOT become unowned.If the LUNs are still owned, there are two options. Option 1 is to escalate the incident to a CTS within CLARiiON Level 2 Support or EMC Sustaining Engineering to perform a non-destructive bind (NDB) on the affected LUNs in the RAID Group. This will involve a temporary data unavailable (DU) situation so hosts connected to those LUNs will have to be brought offline. However, the overall turnaround time in resolving the situation will be faster than Option 2.Option 2 is to follow the below steps for each FRU identified by CAP one at a time. The LUNs will be available during this process but multiple rebuilds will be required for the same RAID group so this can be a lengthy procedure. ONLY ONE FRU SHOULD BE REMOVED PER RAID GROUP AT A TIME. PULLING MULTIPLE DRIVES FROM THE SAME RAID GROUP WILL RESULT IN A DATA UNAVAILABLE (DU) CONDITION. If multiple RAID groups have been identified, those rebuilds can be done in parallel. 1. Ensure that system is in a stable state with no hot spares swapped in or rebuilds in progress. 2. Unbind all hot spares from the system. 3. Remove one of the drives identified by CAP and allow to spin down. 4. Reinsert the drive and observe that it starts rebuilding. 5. Once the rebuild is complete proceed to the next FRU identified by CAP and follow Steps 1-4. 6. Once all FRUs have been rebuilt , upgrade the array to a release containing the fix for this issue. 7. Once the new release has been committed, rebind all of the previously unbound hot spares.The following revisions of FLARE permanently fix the problem: * Release 13 Patch ¿017¿ or greater * Release 14 Patch ¿010¿ or greater * All Release 16 and above revisionsIf this issue is observed in a later release, escalate to a CTS within CLARiiON Level 2 Support or EMC Sustaining Engineering immediately without performing the above recovery actions.Note: There is a bug in CAP (reported June 21, 2005) that results in hot spares being incorrectly reported as having invalid checkpoints. If the hot spares are not in use, unbinding/rebinding them should clear the condition.Notes: The following is sample CAP output: Array Report Data gathered from:SP A on 06:22:46, 04/07/2005SP B on 06:28:00, 04/07/2005Source Files used:K:\submittals\SPA__APM00040301413_ee16d_04-07-2005_06-22-07_data.zipK:\submittals\SPB__APM00040301413_f137c_04-07-2005_06-27-18_data.zipSPCollect versions used:On SP-A: 6.7On SP-B: 6.7Issue Summary Severity Component Issue Issue Action Critical RG 5: FRU 161,162 Multiple drives contain invalid checkpoints. Rebooting or trespassing of LUNS in this RAID group will cause DU See Primus emc106749 for corrective action.

CLARiiON

partitions checkpoints are invalid?

Was this post helpful?