PowerFlex SDC 在单个 NIC 上失去连接后记录 I/O 错误

Summary: 在为 PowerFlex 配置了多个 NIC 的系统中失去单个 NIC 连接时,SDC 可能会将 I/O 错误返回给应用程序。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

场景
PowerFlex 为每个组件使用多个连接(例如,2 个 SDS IP 角色为“All”的连接或 4 个连接 — 2 个表示“SDS-only”,2 个表示“SDC-only”)。

当单个连接丢失时(即,在单个交换机重新启动、关闭单个 NIC 等之后),该问题就会显现出来。

系统范围内没有 DU(DATA_FAILED容量)。

症状
尽管配置了多个连接,SDC 仍报告与单个(或多个)SDS 断开连接:

 <6>2021-09-20T06:52:29.617016+00:00 sdc001 kernel: [5965962.215707] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<6>2021-09-20T06:52:29.628748+00:00 sdc001 kernel: [5965962.227665] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<3>2021-09-20T06:52:29.628773+00:00 sdc001 kernel: [5965962.227668] bond-glance: invalid new link 1 on slave eth4.2223
<6>2021-09-20T06:52:30.638572+00:00 sdc001 kernel: [5965963.239669] bond-nfs: link status definitely down for interface eth4.2226, disabling it
<6>2021-09-20T06:52:30.662562+00:00 sdc001 kernel: [5965963.263771] bond-migration: link status definitely down for interface eth4.2222, disabling it
<6>2021-09-20T06:52:30.662585+00:00 sdc001 kernel: [5965963.263774] bond-migration: making interface eth5.2222 the new active one
<6>2021-09-20T06:52:30.670568+00:00 sdc001 kernel: [5965963.271749] bond-glance: link status definitely down for interface eth4.2223, disabling it
<3>2021-09-20T06:52:32.600563+00:00 sdc001 kernel: [5965965.175504] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000515dfcb3 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.600587+00:00 sdc001 kernel: [5965965.186972] ScaleIO netCon_IsKaNeeded:3761 :CON 0000000030837167 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.646130+00:00 sdc001 kernel: [5965965.251039] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000c6b7b707 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.657522+00:00 sdc001 kernel: [5965965.251092] [5786457902] Disconnected from SDS with ID 2b16b44c00000001  < ======================================================= unexpected
(...)
<3>2021-09-20T06:52:52.894622+00:00 sdc001: [5965985.494552] ScaleIO mapVolIO_ReportIOErrorIfNeeded:491 :[23145851856] IO-ERROR Type WRITE. comb: 24280000 0332. offsetInComb 1464872. SizeInLB 16. SDS_ID 2b16b44c00000001. Comb Gen 2c3f. Head Gen 2f1c. StartLB c793228.
<3>2021-09-20T06:52:52.894624+00:00 sdc001: [5965985.494555] ScaleIO mapVolIO_ReportIOErrorIfNeeded:512 :Vol ID 0x587d75290000000b. Last vol network error status NOT_CONN(4) Reason (ERROR) RC (ERROR) Retry count (20) chan (2)

 

影响

 返回到应用程序的 I/O 错误。

Cause

此类错误来自某种网络配置错误 — 任何组件(SDS 或 SDC)上的其中一个 NIC 可能被置于错误的 VLAN 中、根本不启动、分配了错误的 IP 等。 

在此特定情况下,SDS“2b16b44c00000001”上的一个 NIC 分配给了错误的 VLAN,因此在单个 NIC 上有效地进行 SDC-SDS 通信 — 当此连接中断时,SDC 无法再与此 SDS 通信。由于使用了 IP 角色,因此此 SDS 始终通过“仅 SDS”NIC 连接到 MDM 和其他 SDS,因此 MDM 没有理由重建数据。

Resolution

确保所有组件都按预期连接 — 使用“netstat”和/或 scli 命令(确切的命令取决于 PowerFlex 版本)验证连接。

 

Affected Products

ScaleIO, PowerFlex Software

Products

VxFlex Product Family, VxFlex Ready Node
Article Properties
Article Number: 000193330
Article Type: Solution
Last Modified: 17 Apr 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.