PowerFlex SDC logging I/O errors after losing connectivity on a single NIC

Summary: SDC might return I/O errors to the application when losing a single NIC connectivity in a system with multiple NICs configured for PowerFlex.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Scenario
PowerFlex uses multiple connections for each component (for example 2 connections with SDS IP role "All" or four connections - 2 for "SDS-only" and 2 for "SDC-only").

The issue manifests when a single connection is lost (that is after a single switch reboot, shutting down a single NIC, etc.).

There is no DU (DATA_FAILED capacity) system-wide.

Symptoms
SDC reports disconnection from a single (or more) SDS despite having multiple connections configured:

 <6>2021-09-20T06:52:29.617016+00:00 sdc001 kernel: [5965962.215707] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<6>2021-09-20T06:52:29.628748+00:00 sdc001 kernel: [5965962.227665] bond-glance: link status down for backup interface eth4.2223, disabling it in 1000 ms
<3>2021-09-20T06:52:29.628773+00:00 sdc001 kernel: [5965962.227668] bond-glance: invalid new link 1 on slave eth4.2223
<6>2021-09-20T06:52:30.638572+00:00 sdc001 kernel: [5965963.239669] bond-nfs: link status definitely down for interface eth4.2226, disabling it
<6>2021-09-20T06:52:30.662562+00:00 sdc001 kernel: [5965963.263771] bond-migration: link status definitely down for interface eth4.2222, disabling it
<6>2021-09-20T06:52:30.662585+00:00 sdc001 kernel: [5965963.263774] bond-migration: making interface eth5.2222 the new active one
<6>2021-09-20T06:52:30.670568+00:00 sdc001 kernel: [5965963.271749] bond-glance: link status definitely down for interface eth4.2223, disabling it
<3>2021-09-20T06:52:32.600563+00:00 sdc001 kernel: [5965965.175504] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000515dfcb3 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.600587+00:00 sdc001 kernel: [5965965.186972] ScaleIO netCon_IsKaNeeded:3761 :CON 0000000030837167 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.646130+00:00 sdc001 kernel: [5965965.251039] ScaleIO netCon_IsKaNeeded:3761 :CON 00000000c6b7b707 didn't receive message for 30 iterations.  Marking as down
<3>2021-09-20T06:52:32.657522+00:00 sdc001 kernel: [5965965.251092] [5786457902] Disconnected from SDS with ID 2b16b44c00000001  < ======================================================= unexpected
(...)
<3>2021-09-20T06:52:52.894622+00:00 sdc001: [5965985.494552] ScaleIO mapVolIO_ReportIOErrorIfNeeded:491 :[23145851856] IO-ERROR Type WRITE. comb: 24280000 0332. offsetInComb 1464872. SizeInLB 16. SDS_ID 2b16b44c00000001. Comb Gen 2c3f. Head Gen 2f1c. StartLB c793228.
<3>2021-09-20T06:52:52.894624+00:00 sdc001: [5965985.494555] ScaleIO mapVolIO_ReportIOErrorIfNeeded:512 :Vol ID 0x587d75290000000b. Last vol network error status NOT_CONN(4) Reason (ERROR) RC (ERROR) Retry count (20) chan (2)

 

Impact

 I/O errors returned to the application.

Cause

This kind of errors come from some sort of a network misconfiguration - one of the NICs on any of the components (SDS or SDC) might be put into a wrong VLAN, not brought up at all, have the wrong IP assigned etc. 

In this particular case one of the NICs on the SDS "2b16b44c00000001" was assigned to a wrong VLAN, so effectively SDC-SDS communication was happening over a single NIC - when this connection went down, the SDC could not longer talk to this SDS. Since IP roles were in use this SDS stayed connected to the MDM and other SDS over "SDS-only" NICs, so the MDM had no reason to rebuild the data.

Resolution

Ensure that all components are connected as expected - use 'netstat' and/or scli commands (exact commands depend on PowerFlex version) to verify the connectivity.

 

Affected Products

ScaleIO, PowerFlex Software

Products

VxFlex Product Family, VxFlex Ready Node
Article Properties
Article Number: 000193330
Article Type: Solution
Last Modified: 17 Apr 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.