VNX:在复制根检查点损坏/处于非活动状态时清理复制(用户可纠正)

Summary: 在复制根检查点损坏/处于非活动状态时清理复制(用户可纠正)

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

根检查点损坏(常见原因是后端 LUN 由于磁盘故障而具有不可纠正的内容)
 
[nasadmin@CS0 ~]$ server_mount ALL | grep unmount
root_rep_ckpt_28_242474_1 on /root_rep_ckpt_28_242474_1 ckpt,perm,ro,<unmounted>
root_rep_ckpt_28_242474_2 on /root_rep_ckpt_28_242474_2 ckpt,perm,ro,<unmounted>
root_rep_ckpt_27_242517_1 on /root_rep_ckpt_27_242517_1 ckpt,perm,ro,<unmounted>
root_rep_ckpt_27_242517_2 on /root_rep_ckpt_27_242517_2 ckpt,perm,ro,<unmounted>

在上面的输出中,只能看到根检查点,这表示主文件系统良好。如果 savvol 构建在已损坏的不同存储池上且关联的文件系统是安全的,通常会发生这种情况。
[nasadmin@CS0 ~]$ nas_replicate -l
Name                      Type       Local Mover               Interconnect         Celerra      Status
rep_fs1                filesystem server_2                  -->Replication       Remote_CS  Critical 8865448248:The replication session encountered an error that halted progress.
rep_fs2                filesystem server_2                  -->Replication       Remote_CS  OK
rep_fs3                filesystem server_2                  -->Replication       Remote_CS  Critical 8865448248:The replication session encountered an error that halted progress.
以上输出将反映从 2 个复制严重错误中显而易见的错误(与server_mount输出中看到的 2 组根检查点卸载状态匹配)

Cause

如果数据移动器死机,大多数情况下,这是由于磁盘损坏导致的,VNX 会将文件系统标记为损坏。Root 检查点也将变为卸载状态。

Resolution

提醒:后端问题需要首先修复,例如,如果有任何磁盘需要更换,则需要首先修复。

要删除受影响的复制,请执行以下操作:

如果直接尝试删除会话,则删除任务可能显示为挂起。因为,删除将尝试更新根检查点,并且更新将因损坏而挂起。有关修复此问题的信息,请参阅“注释”部分。此过程要按正确顺序修复,需要通过命令提示符从控制台完成。

1) 以 nasadmin

身份登录控制台 2) 按照以下示例识别复制会话名称:
a) Find the full name of root checkpoints by:
[nasadmin@CS0 ~]$ server_mount ALL | grep unmount
root_rep_ckpt_28_242474_1 on /root_rep_ckpt_28_242474_1 ckpt,perm,ro,<unmounted>
root_rep_ckpt_28_242474_2 on /root_rep_ckpt_28_242474_2 ckpt,perm,ro,<unmounted>
b) For each checkpoint, issue the following command and note the file system name
[nasadmin@CS0 ~]$ /nas/sbin/rootnas_fs -info root_rep_ckpt_28_242474_1 | grep checkpt_of
checkpt_of= fs1 Mon Jun 15 16:51:54 EDT 2015

Repeat above as every FS will have 2 root checkpoints per replication session, so good to get all FS names first before proceeding to next step

2) 按照以下示例识别复制检查点并删除
 
a) Identify the file system name from the replications failing with critical errors by :
nas_replicate -list

Example: 
$ nas_replicate -i jsq-stx-mq
ID                             = 156_APM001_01F4_137_APM002_01F4
Name                           = rep_fs1
Source Status                  = Critical 8865448248: The replication session encountered an error that halted progres                                                                                          s.
Network Status                 = OK
Destination Status             = OK
Last Sync Time                 = Wed Jul 13 14:35:15 EDT 2016
Type                           = filesystem
Celerra Network Server         = CS01
Dart Interconnect              = Replication
Peer Dart Interconnect         = Replication
Replication Role               = source  <== note the role 
Source Filesystem              = fs1 <== this is the fs name if the role is source
Source Data Mover              = server_2
Source Interface               = 10.x.x.x
Source Control Port            = 0
Source Current Data Port       = 0
Destination Filesystem         = fs1-DR <== this is the fs name if the role is destination
Destination Data Mover         = server_2
Destination Interface          = 10.x.x.x
...

Match this name from names identified in step 2 to ensure they are same.
b) Check the replication checkpoint status by : fs_ckpt <fs_name> -list -all

Example: 
$ fs_ckpt fs1 -list -all
id    ckpt_name                creation_time           inuse fullmark   total_savvol_used  ckpt_usage_on_savvol
32    root_rep_ckpt_28_242474_ 06/15/2015-16:51:54-EDT   y   90%        INACTIVE           N/A
33    root_rep_ckpt_28_242474_ 06/15/2015-16:51:56-EDT   y   90%        INACTIVE           N/A
34    fs1_ckpt1                          06/17/2015-16:51:56-EDT   y   90%        INACTIVE           N/A
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.

id    wckpt_name               inuse fullmark total_savvol_used  base  ckpt_usage_on_savvol

INACTIVE indicates, this is corrupted.
c) If the "inuse" value is "y", delete the root checkpoints using the following command:  /nas/sbin/rootnas_fs -delete id=<root_ckpt_id> -o umount=yes -ALLOW_REP_INT_CKPT_OP 
Rarely, root checkpoint may have, inuse" value as "n", then,  /nas/sbin/rootnas_fs -delete id=<root_ckpt_id> -ALLOW_REP_INT_CKPT_OP 

Example:
[nasadmin@CS0 ~]$ /nas/sbin/rootnas_fs -delete id=32  -o umount=yes -ALLOW_REP_INT_CKPT_OP 
id        = 32
name      = root_rep_ckpt_28_242474_1
acl       = 0
in_use    = True
type      = ckpt
worm      = off
..

d) Repeat above step and delete all the root checkpoints 
e) For non root checkpoints, delete using same command with out the last arument (Example: /nas/sbin/rootnas_fs -delete id=34 -o umount=yes)
3) 使用以下语法删除复制会话:
For the "source" replication role : nas_replicate -delete <replication_session_name> -mode source -background
For the "destination" replication role : nas_replicate -delete <replication_session_name> -mode destination -background

Above command will give a task number, which can be used to view the status, by "nas_task -i <task_number>

4) 仅使用上述步骤 3 删除远程端上的复制
 [ "nas_replicate -delete <session_name> -mode <mode> -background" ] 
如果上述所有步骤都已完成,则复制删除活动将完成。可以使用 Unisphere 或命令提示符重新配置新的复制。

如果上述任一步骤失败,请联系 Dell EMC 技术支持 。  引用此知识库文章 ID。

Additional Information

如果直接尝试在 Unisphere 上或通过“nas_replicate -delete”命令删除会话,则删除任务可能显示为挂起。在大多数情况下,无需重新启动数据移动器即可修复此问题。

1) 通过以下方式确定任务:
nas_task -list | grep -i run

2) 通过以下方式查找详细信息(特别是数据移动器名称):
nas_task -info <task_number>

3) 通过以下方式中止任务:
nas_task -abort <task_numer> -mover <data_mover_name>

上面还应该删除复制会话和根检查点。需要手动删除任何用户检查点。

Affected Products

vVNX
Article Properties
Article Number: 000056557
Article Type: Solution
Last Modified: 29 Jul 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.