PowerFlex: SDS panic with Command took too long (TGT_MSG_TYPE__ADD_DEV) while Clear Device Error attempt

Table of Contents

Detailed Article

Symptoms

Cause

Resolution

Videos

Affected Products

Provide Feedback

Summary: Software Defined Storage (SDS) process panic while Clear Device Error attempt runs under certain underlying disk conditions.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Symptoms

Disk observed hardware errors indicate slow disk response.

OS messages:

Mar 14 23:09:49 esx04-svm kernel: sd 6:0:3:0: attempting task abort! scmd(ffffxxxxxa010540)
Mar 14 23:09:49 esx04-svm kernel: sd 6:0:3:0: [sde] CDB: Write(10) 2a 00 24 48 36 60 00 00 08 00
Mar 14 23:09:49 esx04-svm kernel: scsi target6:0:3: handle(0x000d), sas_address(0x58cexxxxx1139e9a), phy(3)
Mar 14 23:09:49 esx04-svm kernel: scsi target6:0:3: enclosure logical id(0x50005xxxxx34abff), slot(3)
Mar 14 23:09:49 esx04-svm kernel: scsi target6:0:3: enclosure level(0x0001), connector name(     )
:
Mar 14 23:13:04 esx04-svm kernel: sd 6:0:3:0: task abort: SUCCESS scmd(ffffxxxxxa010540)
Mar 14 23:13:04 esx04-svm kernel: sd 6:0:3:0: [sde] FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
Mar 14 23:13:04 esx04-svm kernel: sd 6:0:3:0: [sde] CDB: Write(10) 2a 00 1b 6f eb 20 00 00 18 00
Mar 14 23:13:04 esx04-svm kernel: blk_update_request: I/O error, dev sde, sector 460319520
:
Mar 14 23:13:04 esx04-svm systemd: sds.service: main process exited, code=exited, status=254/n/a
Mar 14 23:13:04 esx04-svm systemd: Unit sds.service entered failed state.
Mar 14 23:13:04 esx04-svm systemd: sds.service failed.
Mar 14 23:13:04 esx04-svm systemd: sds.service has no holdoff time, scheduling restart.
Mar 14 23:13:04 esx04-svm systemd: Stopped scaleio sds.
Mar 14 23:13:04 esx04-svm systemd: Started scaleio sds.

SDS trace log:

14/03 23:10:04.535824 0x7fadbde2bdb8:contDevMngr_HandleLongInflightIoViolation:05927: IO on devId: fb775d1000030002 (/dev/sde) took too long, High threshold exceeded - waited for reaper 45830 millis
:
14/03 23:13:11.395813 0x7f4832c55db8:mosAsyncIO_DoIOTimedSyncIntern:01564: Time budget exceeded during IO to /dev/sde (Give-up on inflight IO)
14/03 23:13:11.900810 0x7f4832c55db8:mosAsyncIO_DoIOTimedSyncIntern:01564: Time budget exceeded during IO to /dev/sde (Give-up on inflight IO)
14/03 23:13:11.900846 0x7f4832c55db8:contDev_SendDeviceError:02801: Sending device error to MDM: DevId:fb775dxxxxx30002 deviceName: /dev/sde readError: FALSE WriteError: TRUE
14/03 23:13:11.901335 0x7f4832c55db8:contDev_SendDeviceError:02801: Sending device error to MDM: DevId:fb775dxxxxx30002 deviceName: /dev/sde readError: TRUE WriteError: TRUE

The SDS marked it as in an error state.

Metadata Manager (MDM) events:

2021-03-14 23:13:12.031000:0003180:SDS_DEV_ERROR_REPORT    ERROR    Device error reported on SDS: SDS_x.x.x.x, Device: N/A-0002. State: NORMAL upDownState: UP processState: DEV_ERR_INPROGRESS

Later, a user attempted the Clear Device Error operation to add it back to the system:

2021-03-16 22:24:06.328000:0003258:MDM_CLI_CONF_COMMAND_RECEIVED    INFO     Command clear_sds_device_error received, User: 'admin'. [60856484] SDS Device: ID:fb775dxxxxx30002@N/A.

When the user attempted the Clear Device Error operation the SDS process panicked and disconnected from MDM. As a result, the Clear Device Error command could not be completed internally, and the disk could not be added back to the SDS.

The SDS process panicked due to reaching the internal timeout while handling the internal Add Dev command as part of the Clear Device Error operation.

SDS exp.0:

16/03 22:25:06.340250 [CHOKE_POINT] Panic in file /data/builds/workspace/ScaleIO-Common-Job@2/src/tgt/cont/cont_cmd.c, line 11459, function contCmd_TimerExpiredInner, PID 29524.Panic Expression ALWAYS_ASSERT Command took too long - 2277 (TGT_MSG_TYPE__ADD_DEV) - Restarting.
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(mosDbg_PanicPrepare+0x135) [0x8dd885]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(contCmd_TimerExpiredInner+0x6e) [0x80c52e]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208() [0x73e16b]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(mosTimerQ_PollUnlocked+0xb2) [0x81f372]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(mosTimer_PollQRange+0xf8) [0x821088]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(mosTimer_Loop+0x28) [0x8210e8]
/opt/emc/scaleio/sds/bin/sds-3.0.1000.208(mosOsThrd_StartFunc+0x118) [0x67e798]
/lib64/libpthread.so.0(+0x7dd5) [0x7f17301f8dd5]
16/03 22:25:07.339909 Termination due to SIGALRM. PID 29524

SDS trace log:

16/03 22:24:06.338880 0x7f16e6cefdb8:contDev_Add:02034: Add Dev, path found in /dev/sde
16/03 22:24:16.344801 (nil):contCmd_SmallTimerExpired:11491: Long operation detected - 2277 (TGT_MSG_TYPE__ADD_DEV), Umt 0x7f1xxxxxddb8, context=2277
16/03 22:24:22.344795 (nil):contCmd_SmallTimerExpired:11491: Long operation detected - 2277 (TGT_MSG_TYPE__ADD_DEV), Umt 0x7f1xxxxx3db8, context=2277
16/03 22:25:06.339793 (nil):contCmd_TimerExpiredInner:11454: Command took too long - 2277 (TGT_MSG_TYPE__ADD_DEV), Umt 0x7f1xxxxxdb8, context=2277

OS messages:

Mar 16 22:25:05 esx04-svm kernel: scsi target6:0:3: handle(0x000d), sas_address(0x58ce38ee21139e9a), phy(3)
Mar 16 22:25:05 esx04-svm kernel: scsi target6:0:3: enclosure logical id(0x5000xxxxx234abff), slot(3)
Mar 16 22:25:05 esx04-svm kernel: scsi target6:0:3: enclosure level(0x0001), connector name(     )
Mar 16 22:25:05 esx04-svm kernel: sd 6:0:3:0: task abort: SUCCESS scmd(ffffxxxxx9e9a140)
Mar 16 22:25:07 esx04-svm systemd-udevd: worker [2668] /devices/pci0000:00/0000:00:18.1/0000:1c:00.0/host6/port-6:0/expander-6:0/port-6:0:3/end_device-6:0:3/target6:0:3/6:0:3:0/block/sde is taking a long time

The system is in a DEGRADED state, and a rebuild is triggered:

2021-03-16 22:24:06.328000:0003258:MDM_CLI_CONF_COMMAND_RECEIVED INFO Command clear_sds_device_error received, User: 'admin'. [60856484] SDS Device: ID:fb775dxxxxx30002@N/A.
2021-03-16 22:24:06.328000:0003259:CLI_COMMAND_SUCCEEDED INFO Command clear_sds_device_error succeeded. [60856484]
:
2021-03-16 22:24:16.856000:0003262:SDS_DECOUPLED ERROR SDS: SDS_x.x.x.x (id: yyyyyyyy00000003) decoupled.
2021-03-16 22:24:23.862000:0003263:SDS_DECOUPLED ERROR SDS: SDS_x.x.x.x (id: yyyyyyyy00000003) decoupled.
2021-03-16 22:24:23.863000:0003264:SDS_IN_COOL_DOWN WARNING SDS: SDS_x.x.x.x (ID yyyyyyyy00000003) will disconnect from MDM for 15 seconds. failed to reconnect multiple times
2021-03-16 22:24:24.108000:0003265:MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.

Cause

The defective disk was unstable with slow responses. Therefore, the SDS was unable to add it back to the system in a timely manner and panicked to prevent the process hang up.

This symptom is possibly observed under certain defective disk conditions. PowerFlex and SDS work as designed.

Resolution

Collaborate with the hardware support team to replace the defective disk and hardware to address the issue.

Videos

Affected Products

PowerFlex Software

Article Number: 000185002

Article Type: Solution

Last Modified: 07 Nov 2025

Version: 6

Check if your device is covered by Support Services.

PowerFlex: SDS panic with Command took too long (TGT_MSG_TYPE__ADD_DEV) while Clear Device Error attempt

Summary: Software Defined Storage (SDS) process panic while Clear Device Error attempt runs under certain underlying disk conditions.

Symptoms

Cause

Resolution

Videos

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services