PowerFlex SDC 在失去單一 NIC 的連線能力後,會記錄 I/O 錯誤

Summary: 在具有多個針對 PowerFlex 設定的 NIC 的系統中失去單一 NIC 連線能力時,SDC 可能會將 I/O 錯誤傳回應用程式。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

劇本
PowerFlex 會針對每個元件使用多個連線 (例如 2 個使用 SDS IP 角色「全部」的連線或四個連線 - 2 個代表「僅 SDS」,2 個代表「僅 SDC」)。

當單一連線中斷時 (即在單一交換器重新開機、關閉單一 NIC 等之後),問題就會出現。

整個系統沒有 DU(DATA_FAILED容量)。

症狀
SDC 報告與單一 (或多個) SDS 中斷連線,儘管已設定多個連線:

 <6>2021-09-20T06:52:29.617016+00:00 sdc001 kernel: [5965962.215707] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<6>2021-09-20T06:52:29.628748+00:00 sdc001 kernel: [5965962.227665] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<3>2021-09-20T06:52:29.628773+00:00 sdc001 kernel: [5965962.227668] bond-glance: invalid new link 1 on slave eth4.2223
<6>2021-09-20T06:52:30.638572+00:00 sdc001 kernel: [5965963.239669] bond-nfs: link status definitely down for interface eth4.2226, disabling it
<6>2021-09-20T06:52:30.662562+00:00 sdc001 kernel: [5965963.263771] bond-migration: link status definitely down for interface eth4.2222, disabling it
<6>2021-09-20T06:52:30.662585+00:00 sdc001 kernel: [5965963.263774] bond-migration: making interface eth5.2222 the new active one
<6>2021-09-20T06:52:30.670568+00:00 sdc001 kernel: [5965963.271749] bond-glance: link status definitely down for interface eth4.2223, disabling it
<3>2021-09-20T06:52:32.600563+00:00 sdc001 kernel: [5965965.175504] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000515dfcb3 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.600587+00:00 sdc001 kernel: [5965965.186972] ScaleIO netCon_IsKaNeeded:3761 :CON 0000000030837167 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.646130+00:00 sdc001 kernel: [5965965.251039] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000c6b7b707 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.657522+00:00 sdc001 kernel: [5965965.251092] [5786457902] Disconnected from SDS with ID 2b16b44c00000001  < ======================================================= unexpected
(...)
<3>2021-09-20T06:52:52.894622+00:00 sdc001: [5965985.494552] ScaleIO mapVolIO_ReportIOErrorIfNeeded:491 :[23145851856] IO-ERROR Type WRITE. comb: 24280000 0332. offsetInComb 1464872. SizeInLB 16. SDS_ID 2b16b44c00000001. Comb Gen 2c3f. Head Gen 2f1c. StartLB c793228.
<3>2021-09-20T06:52:52.894624+00:00 sdc001: [5965985.494555] ScaleIO mapVolIO_ReportIOErrorIfNeeded:512 :Vol ID 0x587d75290000000b. Last vol network error status NOT_CONN(4) Reason (ERROR) RC (ERROR) Retry count (20) chan (2)

 

影響

 I/O 錯誤返回到應用程式。

Cause

這類錯誤是由於某種網路設定錯誤所導致 - 任何元件 (SDS 或 SDC) 上的其中一個 NIC 可能被放入錯誤的 VLAN 中、根本沒有顯示、分配了錯誤的 IP 等。 

在此特定情況下,SDS「2b16b44c00000001」上的其中一個 NIC 被指派給錯誤的 VLAN,因此 SDC-SDS 通訊實際上是透過單一 NIC 進行 - 當此連線中斷時,SDC 無法再與此 SDS 通訊。由於正在使用 IP 角色,此 SDS 透過「僅限 SDS」NIC 與 MDM 和其他 SDS 保持連線,因此 MDM 沒有理由重建資料。

Resolution

確定所有元件都如預期連線 - 使用「netstat」和/或 scli 命令 (具體命令取決於 PowerFlex 版本) 來驗證連線能力。

 

Affected Products

ScaleIO, PowerFlex Software

Products

VxFlex Product Family, VxFlex Ready Node
Article Properties
Article Number: 000193330
Article Type: Solution
Last Modified: 17 Apr 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.