PowerFlex:當僅 SDS 網路遺失時,MDM 不會進行容錯移轉

摘要: 當僅 SDS 網路故障時,MDM 叢集不會進行容錯移轉。

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

症狀

當 MDM 設定為節點的僅 SDC IP 位址且僅 SDS 網路失敗時,MDM 叢集不會選取新的主要網路。 

PowerFlex 事件記錄顯示所有 SDS, 除了 MDM 的本機 SDS (f6a8cda900000000) 中斷連線:
 

2018-01-31 18:43:34.738 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010600 (id: f6a8cdab00000002) decoupled. 
2018-01-31 18:43:34.738 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010503 (id: f6a8cdaf00000006) decoupled. 
2018-01-31 18:43:34.738 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010505 (id: f6a8cdb100000008) decoupled. 
2018-01-31 18:43:34.738 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010504 (id: f6a8cdb200000009) decoupled. 
2018-01-31 18:43:35.740 MDM_DATA_FAILED           CRITICAL 	 The system is now in DATA FAILURE state. Some data is unavailable. 
2018-01-31 18:43:35.740 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010500 (id: f6a8cdaa00000001) decoupled. 
2018-01-31 18:43:35.740 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010602 (id: f6a8cdac00000003) decoupled. 
2018-01-31 18:43:35.741 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010502 (id: f6a8cdad00000004) decoupled. 
2018-01-31 18:43:35.741 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010601 (id: f6a8cdae00000005) decoupled. 
2018-01-31 18:43:35.741 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010603 (id: f6a8cdb000000007) decoupled. 
2018-01-31 18:43:35.741 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010604 (id: f6a8cdb30000000a) decoupled. 
2018-01-31 18:43:35.741 SDS_DECOUPLED             ERROR    	 SDS: scaleio-1-13010605 (id: f6a8cdb40000000b) decoupled.

Connectivity matrix shows all SDSes as Unavailable except for the MDM's local SDS (f6a8cda900000000), which shows as connected and reports the other SDSes as disconnected:

連線能力對照表會將所有 SDS 顯示為無法使用,除了 MDM 的本機 SDS (f6a8cda9000000000) 顯示為已連線,並將其他 SDS 報告為已中斷連線:

--------------------------------------------------------------------------
cmatrix status dump (FdID=68e6168500000000, 31/01 18:43:36.744925)
	policy=REBUILD_ALLOWED, issue=SINGLE, coolingOff=TRUE, bypass=FALSE
	nMaxRows=032, nActiveRows=003, nKnownTgts=003
	matrixGen=23, nCycles=767041, duration [ms]: average<1, max=0
	matrix memory foot-print is 17312 [bytes]
row/ column ownership:
	i=000 :: tgtId=f6a8cda900000000 (fsId=f6a8cda900000000)
	i=001 :: tgtId=f6a8cdaa00000001 (fsId=f6a8cdaa00000001)
	i=002 :: tgtId=f6a8cdad00000004 (fsId=f6a8cdad00000004)
cells:
	IDD
	UIU
	UUI
--------------------------------------------------------------------------

 

影響

資料無法使用 

原因

SDS IP 角色在使用中時,MDM 叢集設定錯誤。 

MDM 網路:

MDM 會以兩種 IP 位址新增至 MDM 叢集:「MDM IP」和「MDM Management IP」: 
 

Master MDM:
    Name: scaleio-1-13010500, ID: 0x1e0f57292c8cb3d0
        IPs: 10.8.88.78, 10.9.88.78, Management IPs: 160.6.40.78, Port: 9011, Virtual IP interfaces: N/A
        Version: 2.0.11000
        Actor ID: 0x29ae453d7f732290, Voter ID: 0x5cbb063079e27880 
Slave MDMs:
    Name: scaleio-1-13010501, ID: 0x61c023380fd9add3
        IPs: 10.8.88.80, 10.9.88.80, Management IPs: 160.6.40.80, Port: 9011, Virtual IP interfaces: N/A
        Status: Normal, Version: 2.0.11000
        Actor ID: 0x62b15c4a5f66df63, Voter ID: 0x54fc1da64efdb503, Replication State: Normal
 
    Name: scaleio-1-13010600, ID: 0x2b51a16a2be29722
        IPs: 10.8.88.79, 10.9.88.79, Management IPs: 160.6.40.79, Port: 9011, Virtual IP interfaces: N/A
        Status: Normal, Version: 2.0.11000
        Actor ID: 0x777bf7f569f01082, Voter ID: 0x158f1c0841d4c712, Replication State: Normal

TL;DR:MDM 監控叢集同步的「MDM IP」位址,除非 MDM 使用這些 IP 位址沒有回應,否則不會失去與 MDM 的同步。(在本例中為 10.8.88.xx 和 10.9.88.xx。)

 

SDS 網路:

SDS 各配置了四個 IP 位址:(query_all輸出)

Protection Domain 68e6168500000000 Name: domain_PD_0000
SDS ID: f6a8cdad00000004 Name: scaleio-1-13010502 State: Connected, Joined IP: 10.8.88.85,10.9.88.85,10.10.88.8,10.11.88.8 Port: 7072 Version: 2.0.11000
SDS ID: f6a8cdaa00000001 Name: scaleio-1-13010500 State: Connected, Joined IP: 10.8.88.78,10.9.88.78,10.10.88.1,10.11.88.1 Port: 7072 Version: 2.0.11000
SDS ID: f6a8cda900000000 Name: scaleio-1-13010501 State: Connected, Joined IP: 10.8.88.80,10.9.88.80,10.10.88.3,10.11.88.3 Port: 7072 Version: 2.0.11000
The SDS IP role configuration is split into SDC-only and SDS-only, (per TGT_dump in MDM getinfo):

SDS IP 角色組態會分為僅 SDC 和僅限 SDS (根據 MDM getinfo 中的TGT_dump):

0: ID: f6a8cda900000000 Name: scaleio-1-13010501 fdId: 68e6168500000000 fsId: 0000000000000000
IP:  10.8.88.80,10.9.88.80,10.10.88.3,10.11.88.3 Port: 7072
 States: NORMAL UpDown: UP Process: IDLE RefCnt: 7 GenNum: 910  KeepaliveState: NORMAL    IPs:  10.8.88.80 (SDC Only) 10.9.88.80 (SDC Only) 10.10.88.3 (SDS Only) 10.11.88.3 (SDS Only)
As this cluster had SDS IP roles in use, the MDM reported correctly in its connectivity matrix output that all SDSes (besides its local SDS) were unavailable when the SDS-only networks failed: (Note that even though the SDSes aren't all in the same PD and cmatrix only shows one PD per file)

由於此叢集正在使用 SDS IP 角色,因此當僅 SDS 網路故障時,MDM 會在其連線矩陣輸出中正確報告所有 SDS (除了本機 SDS) 均無法使用:(請注意,即使 SDS 並非都在同一個 PD 中,並且 cmatrix 每個檔只顯示一個 PD)

--------------------------------------------------------------------------
cmatrix status dump (FdID=68e6168500000000, 31/01 18:43:36.744925)
	policy=REBUILD_ALLOWED, issue=SINGLE, coolingOff=TRUE, bypass=FALSE
	nMaxRows=032, nActiveRows=003, nKnownTgts=003
	matrixGen=23, nCycles=767041, duration [ms]: average<1, max=0
	matrix memory foot-print is 17312 [bytes]
row/ column ownership:
	i=000 :: tgtId=f6a8cda900000000 (fsId=f6a8cda900000000)
	i=001 :: tgtId=f6a8cdaa00000001 (fsId=f6a8cdaa00000001)
	i=002 :: tgtId=f6a8cdad00000004 (fsId=f6a8cdad00000004)
cells:
	IDD
	UIU
	UUI
--------------------------------------------------------------------------

造成此問題的原因是 MDM 僅監控具有 SDS 角色的 IP 位址 (「僅 SDS」或「全部」),以查看 SDS 中的 keepalive。

摘要:

在這種情況下,使用 10.8.88.xx 和 10.9.88.xx (僅限 SDC) 網路的節點-節點連線正常,MDM 叢集同步也正常。 

使用 10.10.88.x 和 10.11.88.x (僅限 SDS) 網路的節點間連線中斷,且 MDM-SDS 保活失敗。 

從主要 MDM 的角度來看,唯一發生的事件是每個 SDS,但其本地事件超時。 

解析度

沒有因應措施,MDM 叢集必須針對 SDS 網路組態進行適當設定。 

使用 SDS IP 角色時,MDM IP 應僅位於主機的僅 SDS 網路上。 

如果是這種情況,遺失兩個 NIC 會導致 MDM 叢集故障至另一個節點,重建將會開始,且磁碟區存取不會中斷。 

您不需要將 SDC 的 MDM IP 組態變更為僅限 SDS 的 IP,因為 MDM 程序會偵聽所有 IP 位址。

 

其他資訊

受影響的版本

全部  

已修正問題的版本

N/A,依設計運作  

受影響的產品

VxFlex Product Family

產品

PowerFlex rack, VxFlex Product Family
文章屬性
文章編號: 000040756
文章類型: Solution
上次修改時間: 01 10月 2025
版本:  5
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。