Metro Node:升級至 8.0.x 後,中繼資料備份停止運作
Summary: 本文討論升級至 8.0.x 程式碼後,中繼資料備份無法運作的問題。本文提供還原中繼資料備份功能的因應措施步驟。
Symptoms
Dell 受影響的硬體:
Metro node mn114
Metro node mn215
Metro Node - 本機/Metro
Dell 受影響的軟體:
Metro node 作業系統 8.0.0.0.0.267
、Metro Node 作業系統 8.0.0.1.0.21
、Metro Node 作業系統 8.0.1.0.0.220
受影響的變更活動:
升級至 Metro Node 作業系統 8.0.x 後
問題:
-
可使用
ndu pre-check命令會針對 Metro 節點組態中的每個叢集回報以下錯誤:叢集 1 的範例:
VPlexcli:/> ndu pre-check Warning: During the NDU process, multiple directors will be offline for a portion of the time. This is non-disruptive but is dependent on a host-based multipathing solution being installed, configured, and operating on all connected hosts. ================================================================================ Performing NDU pre-checks ================================================================================ Verify NDU is not in progress.. OK Verify that the directors have been running continuously for 15 days.. OK Verify director communication status.. OK . . . Verify meta-volume backup configuration.. ERROR . . . ================================================================================ Errors (x errors found) ================================================================================ cluster-1 Metadata backups are NOT created according to schedule Last backup: Mon Aug 19 00:00:00 UTC 20xx Current time: Fri Dec 13 03:41:33 UTC 20xx There has been no metadata backup for 116 day(s) Run 'metadatabackup local' on cluster-1
叢集 2 的範例:
VPlexcli:/> ndu pre-check Warning: During the NDU process, multiple directors will be offline for a portion of the time. This is non-disruptive but is dependent on a host-based multipathing solution being installed, configured, and operating on all connected hosts. ================================================================================ Performing NDU pre-checks ================================================================================ Verify NDU is not in progress.. OK Verify that the directors have been running continuously for 15 days.. OK Verify director communication status.. OK . . . Verify meta-volume backup configuration.. ERROR . . . ================================================================================ Errors (x errors found) ================================================================================ cluster-2 Metadata backups are NOT created according to schedule Last backup: Sat Mar 16 01:30:00 UTC 20xx Current time: Fri Dec 13 03:41:33 UTC 20xx There has been no metadata backup for 272 day(s) Run 'metadatabackup local' on cluster-2
-
命令
ll ~system-volumes命令執行時,中繼資料備份磁碟區日期會反映先前的日期。在以下範例中,中繼資料備份在 Metro 環境中的兩個叢集上都停止運作:
VPlexcli:/> ll ~system-volumes /clusters/cluster-1/system-volumes: Name Volume Type Operational Health Active Ready Geometry Component Block Block Capacity Slots --------------------------------------- -------------- Status State ------ ----- -------- Count Count Size -------- ----- --------------------------------------- -------------- ----------- ------ ------ ----- -------- --------- -------- ----- -------- ----- meta_C1_xxxxxx meta-volume ok ok true true raid-1 2 20971264 4K 80G 64000 meta_C1_xxxxxxx_backup_20xx-11-21_01-30 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000 \------------/ date and time the last backup was run /clusters/cluster-2/system-volumes: Name Volume Type Operational Health Active Ready Geometry Component Block Block Capacity Slots --------------------------------------- -------------- Status State ------ ----- -------- Count Count Size -------- ----- --------------------------------------- -------------- ----------- ------ ------ ----- -------- --------- -------- ----- -------- ----- meta_C2_xxxxxx meta-volume ok ok true true raid-1 2 20971264 4K 80G 64000 meta_C2_xxxxxxx_backup_20xx-11-20_12-43 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000 \------------/ date and time the last backup was run
症狀:
- 中繼資料備份會在 Metro 環境中的兩個叢集 上 停止運作。
- 中繼資料 備份會在 Metro 環境中的任一叢集上停止運作
- 中繼資料備份在本 機 叢集中停止運作
Cause
在排定的每日中繼資料備份期間,服務「daily_metadata_backup.service」偶爾會卡在導向器-1-1-A、director-2-1-A 或兩者上的啟用狀態。
Resolution
永久解決方案:
Metro Node 工程部門正在調查此問題。有可用的修正程式時,就會更新本文。
因應措施:
-
若要檢查服務「daily_metadata_backup.service」的狀態,請在命令提示符下執行以下命令:
sudo systemctl status daily_metadata_backup.service在 A 節點上,例如導向器-1-1-A 或導向器-2-1-A。檢查並確認「Active: acactivation (start)」屬性存在,且執行時間超過一分鐘。如果是,則表示此服務卡在該特定 A 節點上。以下範例顯示導向器-1-1-A 和導向器-2-1-A 均具有服務「daily_metadata_backup.service」屬性「Active: acactivation (start)」,且已執行超過一分鐘,表示此服務停滯在這些節點上,如下所示。
叢集-1:
service@director-2-1-a:~> sudo systemctl status daily_metadata_backup.service ● daily_metadata_backup.service - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.service; static) Active: activating (start) since Sat 2024-10-xx 01:30:18 UTC; 1 month 3 days ago <--------------------------- TriggeredBy: ● daily_metadata_backup.timer Main PID: 22553 (daily_metadata_) Tasks: 1 CGroup: /system.slice/daily_metadata_backup.service └─22553 /usr/bin/python3 /opt/dell/vplex/sbin/daily_metadata_backup.py Oct xx 01:30:18 director-2-1-a systemd[1]: Starting metronode automated daily metadata backups... . . . <truncated>叢集-2:
service@director-1-1-a:~> sudo systemctl status daily_metadata_backup.service ● daily_metadata_backup.service - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.service; static) Active: activating (start) since Sat 2024-10-xx 01:30:18 UTC; 1 month 2 days ago <--------------------------- TriggeredBy: ● daily_metadata_backup.timer Main PID: 22553 (daily_metadata_) Tasks: 1 CGroup: /system.slice/daily_metadata_backup.service └─22553 /usr/bin/python3 /opt/dell/vplex/sbin/daily_metadata_backup.py Oct xx 01:30:18 director-1-1-a systemd[1]: Starting metronode automated daily metadata backups... . . . <truncated> -
在 A 節點上檢查服務「daily_metadata_backup.timer」的狀態 (例如,director-1-1-A、director-2-1-A) 旁,執行下列命令:
sudo systemctl status daily_metadata_backup.timer並確認「觸發器:」屬性顯示為「n/a」。如果是,則表示此服務卡在該特定 A 節點上。下列範例顯示 director-1-1-A 和 director-2-1-A 都有服務「daily_metadata_backup.timer」屬性「Trigger:」,顯示為「n/a」,表示此服務卡在這些節點上。
叢集-1:
service@director-1-1-a:~> sudo systemctl status daily_metadata_backup.timer ● daily_metadata_backup.timer - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.timer; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/daily_metadata_backup.timer.d └─daily_backup.conf Active: active (running) since Wed 2024-11-20 12:46:10 UTC; 18h ago Trigger: n/a <<<<<<<<<<<< Triggers: ● daily_metadata_backup.service Nov 20 12:46:10 director-1-1-a systemd[1]: Started metronode automated daily metadata backups. service@director-1-1-a:~>
叢集-2:
service@director-2-1-a:~> sudo systemctl status daily_metadata_backup.timer ● daily_metadata_backup.timer - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.timer; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/daily_metadata_backup.timer.d └─daily_backup.conf Active: active (running) since Wed 2024-11-xx 12:46:10 UTC; 18h ago Trigger: n/a >>>>>>>>>>>>>>>>>>>>>>> Triggers: ● daily_metadata_backup.service Nov xx 12:46:10 director-2-1-a systemd[1]: Started metronode automated daily metadata backups. service@director-2-1-a:~>
-
確認哪個節點(或可能兩個節點)停滯了提到的兩個服務後,請停止服務“daily_metadata_backup.service”和“daily_metadata_backup.timer”,然後啟動“daily_metadata_backup.timer”服務以解決這種情況並使元數據備份開始運行。
注意:請勿使用 「restart」命令選項在以下範例中,由於兩個 A 節點都受到影響,因此會按如下方式停止和啟動服務:
sudo systemctl stop daily_metadata_backup.service
sudo systemctl stop daily_metadata_backup.timer
sudo systemctl start daily_metadata_backup.timer
-
執行下列命令以檢查狀態,確認其不再停滯,如下所示:
以下範例顯示如何執行「daily_metadata_backup.service」的狀態命令,以檢查「Active: inactive (dead)」行是否表示服務確實未執行,而在等待中繼資料的下一個備份週期時,該行為「inactive (dead)」:
service@director-2-1-a:~> sudo systemctl status daily_metadata_backup.service ● daily_metadata_backup.service - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.service; static) Active: inactive (dead) since Fri 2024-11-22 21:07:41 UTC; 1min 49s ago >>>>>>>>>>>> TriggeredBy: ● daily_metadata_backup.timer Process: 9183 ExecStart=/opt/dell/vplex/sbin/daily_metadata_backup.py (code=exited, status=0/SUCCESS) Main PID: 9183 (code=exited, status=0/SUCCESS) Nov 22 21:07:36 director-2-1-a systemd[1]: Starting metronode automated daily metadata backups... Nov 22 21:07:41 director-2-1-a systemd[1]: daily_metadata_backup.service: Succeeded. Nov 22 21:07:41 director-2-1-a systemd[1]: Finished metronode automated daily metadata backups. service@director-2-1-a:~>service@director-1-1-a:~> sudo systemctl status daily_metadata_backup.service ● daily_metadata_backup.service - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.service; static) Active: inactive (dead) since Fri 2024-11-22 21:07:41 UTC; 1min 49s ago >>>>>>>>>>>> TriggeredBy: ● daily_metadata_backup.timer Process: 9183 ExecStart=/opt/dell/vplex/sbin/daily_metadata_backup.py (code=exited, status=0/SUCCESS) Main PID: 9183 (code=exited, status=0/SUCCESS) Nov 22 21:07:36 director-1-1-a systemd[1]: Starting metronode automated daily metadata backups... Nov 22 21:07:41 director-1-1-a systemd[1]: daily_metadata_backup.service: Succeeded. Nov 22 21:07:41 director-1-1-a systemd[1]: Finished metronode automated daily metadata backups. service@director-2-1-a:~>以下範例顯示服務「daily_metadata_backup.timer」應為「active(waiting)」,而「觸發器」應設為目前或目前,表示服務現在如預期運作:
service@director-2-1-a:~> sudo systemctl status daily_metadata_backup.timer ● daily_metadata_backup.timer - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.timer; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/daily_metadata_backup.timer.d └─daily_backup.conf Active: active (waiting) since Fri 2024-11-22 21:09:24 UTC; 14s ago >>>>>>>>>>> Trigger: Sat 2024-11-23 01:30:00 UTC; 4h 20min left >>>>>>>>>>> Triggers: ● daily_metadata_backup.service Nov 22 21:09:24 director-2-1-a systemd[1]: Started metronode automated daily metadata backups. service@director-2-1-a:~>service@director-1-1-a:~> sudo systemctl status daily_metadata_backup.timer ● daily_metadata_backup.timer - metronode automated daily metadata backups Loaded: loaded (/etc/systemd/system/daily_metadata_backup.timer; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/daily_metadata_backup.timer.d └─daily_backup.conf Active: active (waiting) since Fri 2024-11-22 21:09:24 UTC; 14s ago >>>>>>>>>>> Trigger: Sat 2024-11-23 01:30:00 UTC; 4h 20min left >>>>>>>>>>> Triggers: ● daily_metadata_backup.service Nov 22 21:09:24 director-1-1-a systemd[1]: Started metronode automated daily metadata backups. service@director-2-1-a:~> -
等待並監控下一個中繼資料備份是否完成,方法是執行
ll ~system-volumes命令以確認問題已解決,且中繼資料備份已成功執行,如下所示。範例:
VPlexcli:/> ll ~system-volumes /clusters/cluster-1/system-volumes: Name Volume Type Operational Health Active Ready Geometry Component Block Block Capacity Slots --------------------------------------- -------------- Status State ------ ----- -------- Count Count Size -------- ----- --------------------------------------- -------------- ----------- ------ ------ ----- -------- --------- -------- ----- -------- ----- meta_C1_xxxxxx meta-volume ok ok true true raid-1 2 20971264 4K 80G 64000 meta_C1_xxxxxxx_backup_2024-11-23_01-30 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000 meta_C1_4UQT429_backup_2024-11-24_01-30 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000 /clusters/cluster-2/system-volumes: Name Volume Type Operational Health Active Ready Geometry Component Block Block Capacity Slots --------------------------------------- -------------- Status State ------ ----- -------- Count Count Size -------- ----- --------------------------------------- -------------- ----------- ------ ------ ----- -------- --------- -------- ----- -------- ----- meta_C2_xxxxxx meta-volume ok ok true true raid-1 2 20971264 4K 80G 64000 meta_C2_xxxxxxx_backup_2024-11-23_12-43 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000 meta_C2_xxxxxxx_backup_2024-11-24_12-43 meta-volume ok ok false true raid-1 1 20971264 4K 80G 64000