PowerFlex SDS panic caused by Linux Kernel bug

Summary: The issue only affecting Intel Haswell CPU SDS panic Data Unavailability (DU) caused by single SDS panic Long I/O serving causing SDC I/O failure. Linux Kernel bug

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Scenario

  • Intel Haswell CPU is being used.
  • One of the SDSs report "data degraded" state and SDC's lost connection to volumes, with no obvious reason
  • SDS panic
     

Symptoms

  • ScaleIO system events report "data degraded":
ScaleIO system events report "data degraded":
205466 2015-12-10 08:11:49.450 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
205468 2015-12-10 08:12:04.688 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
205470 2015-12-10 08:12:06.699 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
205472 2015-12-10 08:12:16.931 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.

SDS exp.0:

10/12 02:13:14.134144 Panic in file /emc/svc_flashbld/workspace/ScaleIO-SLES12/src/tgt/ioh/ioh.c, line 70, function iohIo_TimerExpired, PID 22333.Panic Expression !(1).
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosDbg_BackTrace+0x22) [0x479ba9]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosDbg_Panic+0xf0) [0x4740ad]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(iohIo_TimerExpired+0x5d) [0x43d92d]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosTimerQ_PollUnlocked+0x1b4) [0x46f6e3]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosTimer_PollQRange+0x83) [0x46fa6c]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(netPoll_StartIntr+0x2ef) [0x465808]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosUmt_StartFunc+0xbe) [0x47f07d]
/opt/emc/scaleio/sds/bin/sds-1.32.3455.5(mosUmt_SignalHandler+0x4a) [0x47fa3a]

 

Impact

  • Data unavailable
  • SDC lost connection to volumes.
  • I/O failure
  • Long I/O service/performance degradation

Cause

Due to the Linux kernel bug, the SDS process behaved abnormally, because of this condition, the SDS process was in stress and the behavior was unpredictable.
While replying to keep alive requests, the SDS was not fully functional and was not responding to SDC I/O requests.
Such a condition did not allow ScaleIO to mark the SDS as failed, which eventually led to data unavailable.

 

Resolution

Workaround

  • Upgrade the Linux Kernel version.

Affected Products

PowerFlex rack, ScaleIO
Article Properties
Article Number: 000281636
Article Type: Solution
Last Modified: 06 Feb 2025
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.