VNX:在複製根檢查點損毀/非使用中時清理複寫 (使用者可修正)
Summary: 在複製根檢查點損毀/非使用中時清理複寫 (使用者可修正)
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
根檢查點損毀 (常見原因是後端 LUN 因磁碟故障而無法修正)
在上述輸出中,只能看到根檢查點,這表示主要檔案系統狀況良好。如果 savvol 建置於不同的儲存集區上,而該儲存集區已損毀,且相關聯的檔案系統安全無虞,通常會發生這種情況。
[nasadmin@CS0 ~]$ server_mount ALL | grep unmount root_rep_ckpt_28_242474_1 on /root_rep_ckpt_28_242474_1 ckpt,perm,ro,<unmounted> root_rep_ckpt_28_242474_2 on /root_rep_ckpt_28_242474_2 ckpt,perm,ro,<unmounted> root_rep_ckpt_27_242517_1 on /root_rep_ckpt_27_242517_1 ckpt,perm,ro,<unmounted> root_rep_ckpt_27_242517_2 on /root_rep_ckpt_27_242517_2 ckpt,perm,ro,<unmounted>
在上述輸出中,只能看到根檢查點,這表示主要檔案系統狀況良好。如果 savvol 建置於不同的儲存集區上,而該儲存集區已損毀,且相關聯的檔案系統安全無虞,通常會發生這種情況。
[nasadmin@CS0 ~]$ nas_replicate -l Name Type Local Mover Interconnect Celerra Status rep_fs1 filesystem server_2 -->Replication Remote_CS Critical 8865448248:The replication session encountered an error that halted progress. rep_fs2 filesystem server_2 -->Replication Remote_CS OK rep_fs3 filesystem server_2 -->Replication Remote_CS Critical 8865448248:The replication session encountered an error that halted progress.上述輸出會從 2 個複寫嚴重錯誤中反映出明顯的錯誤 (這與server_mount輸出中看到的 2 組根檢查點卸載狀態相符)
Cause
如果資料移動者發生錯誤,大部分都是因為磁碟毀損所致,VNX 會將檔案系統標記為毀損。根檢查點也會取消掛接。
Resolution
注意:後端問題必須先修正,例如如果有任何磁碟需要更換,那必須先修正。
若要刪除受影響的複製:
如果直接嘗試刪除會話,則刪除任務可能顯示為掛起。因為,刪除將嘗試更新根檢查點,並且更新將因損壞而掛起。請參閱「備註」區段以解決此問題。此程序要以正確的順序修正,必須透過命令提示字元從控制站完成。
1) 以 nasadmin
身分登入控制站 2) 依照以下範例識別複寫工作階段名稱:
2) 依照此範例識別複寫檢查點並刪除
3) 使用下列語法刪除複寫工作階段:
4) 僅使用上述步驟 3 也刪除遠端端的複製
如果上述任何步驟失敗,請聯絡 Dell EMC 技術支援 部門。 引用此知識庫文章ID。
若要刪除受影響的複製:
如果直接嘗試刪除會話,則刪除任務可能顯示為掛起。因為,刪除將嘗試更新根檢查點,並且更新將因損壞而掛起。請參閱「備註」區段以解決此問題。此程序要以正確的順序修正,必須透過命令提示字元從控制站完成。
1) 以 nasadmin
身分登入控制站 2) 依照以下範例識別複寫工作階段名稱:
a) Find the full name of root checkpoints by: [nasadmin@CS0 ~]$ server_mount ALL | grep unmount root_rep_ckpt_28_242474_1 on /root_rep_ckpt_28_242474_1 ckpt,perm,ro,<unmounted> root_rep_ckpt_28_242474_2 on /root_rep_ckpt_28_242474_2 ckpt,perm,ro,<unmounted> b) For each checkpoint, issue the following command and note the file system name [nasadmin@CS0 ~]$ /nas/sbin/rootnas_fs -info root_rep_ckpt_28_242474_1 | grep checkpt_of checkpt_of= fs1 Mon Jun 15 16:51:54 EDT 2015 Repeat above as every FS will have 2 root checkpoints per replication session, so good to get all FS names first before proceeding to next step
2) 依照此範例識別複寫檢查點並刪除
a) Identify the file system name from the replications failing with critical errors by :
nas_replicate -list
Example:
$ nas_replicate -i jsq-stx-mq
ID = 156_APM001_01F4_137_APM002_01F4
Name = rep_fs1
Source Status = Critical 8865448248: The replication session encountered an error that halted progres s.
Network Status = OK
Destination Status = OK
Last Sync Time = Wed Jul 13 14:35:15 EDT 2016
Type = filesystem
Celerra Network Server = CS01
Dart Interconnect = Replication
Peer Dart Interconnect = Replication
Replication Role = source <== note the role
Source Filesystem = fs1 <== this is the fs name if the role is source
Source Data Mover = server_2
Source Interface = 10.x.x.x
Source Control Port = 0
Source Current Data Port = 0
Destination Filesystem = fs1-DR <== this is the fs name if the role is destination
Destination Data Mover = server_2
Destination Interface = 10.x.x.x
...
Match this name from names identified in step 2 to ensure they are same.
b) Check the replication checkpoint status by : fs_ckpt <fs_name> -list -all
Example:
$ fs_ckpt fs1 -list -all
id ckpt_name creation_time inuse fullmark total_savvol_used ckpt_usage_on_savvol
32 root_rep_ckpt_28_242474_ 06/15/2015-16:51:54-EDT y 90% INACTIVE N/A
33 root_rep_ckpt_28_242474_ 06/15/2015-16:51:56-EDT y 90% INACTIVE N/A
34 fs1_ckpt1 06/17/2015-16:51:56-EDT y 90% INACTIVE N/A
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.
id wckpt_name inuse fullmark total_savvol_used base ckpt_usage_on_savvol
INACTIVE indicates, this is corrupted.
c) If the "inuse" value is "y", delete the root checkpoints using the following command: /nas/sbin/rootnas_fs -delete id=<root_ckpt_id> -o umount=yes -ALLOW_REP_INT_CKPT_OP
Rarely, root checkpoint may have, inuse" value as "n", then, /nas/sbin/rootnas_fs -delete id=<root_ckpt_id> -ALLOW_REP_INT_CKPT_OP
Example:
[nasadmin@CS0 ~]$ /nas/sbin/rootnas_fs -delete id=32 -o umount=yes -ALLOW_REP_INT_CKPT_OP
id = 32
name = root_rep_ckpt_28_242474_1
acl = 0
in_use = True
type = ckpt
worm = off
..
d) Repeat above step and delete all the root checkpoints
e) For non root checkpoints, delete using same command with out the last arument (Example: /nas/sbin/rootnas_fs -delete id=34 -o umount=yes)
For the "source" replication role : nas_replicate -delete <replication_session_name> -mode source -background For the "destination" replication role : nas_replicate -delete <replication_session_name> -mode destination -background Above command will give a task number, which can be used to view the status, by "nas_task -i <task_number>
4) 僅使用上述步驟 3 也刪除遠端端的複製
[ "nas_replicate -delete <session_name> -mode <mode> -background" ]如果完成上述所有步驟,這將完成複製刪除活動。可使用 Unisphere 或命令提示字元重新設定新的複寫。
如果上述任何步驟失敗,請聯絡 Dell EMC 技術支援 部門。 引用此知識庫文章ID。
Additional Information
如果直接嘗試在 Unisphere 上或透過「nas_replicate -delete」命令刪除工作階段,刪除工作可能會顯示為掛起。在大多數情況下,無需重新啟動數據移動器即可解決此問題。
1) 通過以下方式識別任務:
2) 依下列方式尋找詳細資訊 (特別是資料移動者名稱):
3)通過以下方式中止任務:
上述也應刪除複製工作階段和根檢查點。任何使用者檢查點都需要手動刪除。
1) 通過以下方式識別任務:
nas_task -list | grep -i run
2) 依下列方式尋找詳細資訊 (特別是資料移動者名稱):
nas_task -info <task_number>
3)通過以下方式中止任務:
nas_task -abort <task_numer> -mover <data_mover_name>
上述也應刪除複製工作階段和根檢查點。任何使用者檢查點都需要手動刪除。
Affected Products
vVNXArticle Properties
Article Number: 000056557
Article Type: Solution
Last Modified: 29 Jul 2025
Version: 4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.