PowerFlex 多 SDS 解耦 - Micron 固態硬碟 (5200 和 5300 系列) 速度緩慢的裝置
摘要: SDS 解耦 - 如果同時有一個以上的 SDS 解耦,或是 SDS 在重建期間解耦,系統可能會偶爾遇到 DU。
本文章適用於
本文章不適用於
本文無關於任何特定產品。
本文未識別所有產品版本。
症狀
場景
裝置回應緩慢導致 SDS 程序分離
注意:雖然 PowerFlex 系統的行為與預期相同,並且在遇到「長 I/O」後,SDS 會解耦,但特定 Micron SSD 類型 (MTFDDAK) 的已知問題會導致此異常。
從 MDM 事件中,我們可以看到 SDS 在 SDS 要求後分離:
132103 2021-03-19 16:28:20.952 SDS_DECOUPLED ERROR SDS: Sds-10.141.2.40 (id: a4e88e5e00000003) decoupled. 132104 2021-03-19 16:28:20.994 SDS_IN_COOL_DOWN WARNING SDS: Sds-10.141.2.40 (ID a4e88e5e00000003) will disconnect from MDM for 2 seconds. requested by SDS 132105 2021-03-19 16:28:21.803 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.
在 SDS 追蹤中,我們可能會看到下列訊息:
19/03 16:28:33.973188 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdf gaps 0, longest gap 0, pending gap 7180, in progress 0, last IO time 0, Longest pend IO time 7180, Slow device 0, Inflight 7180 millis 19/03 16:28:33.973325 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdj gaps 0, longest gap 0, pending gap 7180, in progress 0, last IO time 0, Longest pend IO time 7180, Slow device 0, Inflight 7180 millis 19/03 16:28:34.969182 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdf gaps 0,longest gap 0, pending gap 8180, in progress 0, last IO time 0, Longest pend IO time 8180, Slow device 0, Inflight 8180 millis 19/03 16:28:34.981678 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdj gaps 0, longest gap 0, pending gap 8190, in progress 0, last IO time 0, Longest pend IO time 8190, Slow device 0, Inflight 8190 millis . . . trc.u.5.11:19/03 16:28:36.988777 0x7efea139bdb0:contDevMngr_HandleLongInflightIoViolation:07387: IO on devId: af7bdb9b00000005 (/dev/mapper/svm_sdf) took too long, Low threshold exceeded - waited for reaper 10200 millis
最終,SDS 會中斷連線 (如設計):
19/03 16:28:59.026244 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdf gaps 1, longest gap 30970, pending gap 0, in progress 44, last IO time 30970, Longest pend IO time 0, Slow device 0, Inf light 0 millis 19/03 16:28:59.026268 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07624: Update device state /dev/mapper/svm_sdj gaps 1, longest gap 31810, pending gap 0, in progress 14, last IO time 31810, Longest pend IO time 0, Slow device 0, Inf light 0 millis 19/03 16:28:59.026274 0x7efea139bdb0:contDevMngr_CheckLongInflightIos:07711: Low threshold crossed 19/03 16:28:59.026302 0x7efea139bdb0:contNet_AbnormalExitCK:02618: Will pause network 19/03 16:28:59.026306 0x7efea139bdb0:net_Pause:02235: Net paused 1 (reversible 0) 19/03 16:30:29.539338 (nil):mosTrcLayer_Create:00239: ---------- Process started. Version private PowerFlex R3_5.1100.107_Release, CodeBase , Oct 27 2020. PID 26008 ----------
從系統記錄 (/var/log/messages) 中,可能會在 SDS 程序重新開機之前發現工作中止:
Mar 19 16:28:39 mtp-pflex4 kernel: sd 0:0:5:0: attempting task abort! scmd(ffff940253393000) Mar 19 16:28:39 mtp-pflex4 kernel: sd 0:0:5:0: [sdf] tag#51 CDB: ATA command pass through(12)/Blank a1 08 0e 00 01 00 00 00 00 ec 00 00 Mar 19 16:28:39 mtp-pflex4 kernel: scsi target0:0:5: handle(0x000f), sas_address(0x500056b33bee1fc5), phy(5) Mar 19 16:28:39 mtp-pflex4 kernel: scsi target0:0:5: enclosure logical id(0x500056b31234abff), slot(5) Mar 19 16:28:39 mtp-pflex4 kernel: scsi target0:0:5: enclosure level(0x0001), connector name( ) Mar 19 16:28:40 mtp-pflex4 kernel: sd 0:0:5:0: task abort: SUCCESS scmd(ffff940253393000) Mar 19 16:28:40 mtp-pflex4 kernel: sd 0:0:5:0: Power-on or device reset occurred Mar 19 16:28:41 mtp-pflex4 systemd: sds.service: main process exited, code=exited, status=254/n/a Mar 19 16:28:41 mtp-pflex4 systemd: Unit sds.service entered failed state. Mar 19 16:28:41 mtp-pflex4 systemd: sds.service failed. Mar 19 16:28:41 mtp-pflex4 systemd: sds.service has no holdoff time, scheduling restart. Mar 19 16:28:41 mtp-pflex4 systemd: Stopped scaleio sds. Mar 19 16:28:41 mtp-pflex4 systemd: Started scaleio sds.
檢查磁碟裝置類型,我們可以在「MTFDDAK」(5200) 系列中看到,此系列有已知問題,且已從支援矩陣中移除:
[COMMAND: lsscsi] [RET: 0] [OUTPUT: [0:0:0:0] disk ATA MTFDDAK1T9TDD F004 /dev/sda [0:0:1:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdb [0:0:2:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdc [0:0:3:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdd [0:0:4:0] disk ATA MTFDDAK1T9TDD F004 /dev/sde [0:0:5:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdf [0:0:6:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdg [0:0:7:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdh [0:0:8:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdi [0:0:9:0] disk ATA MTFDDAK1T9TDD F004 /dev/sdj [0:0:10:0] enclosu DP BP14G+EXP 2.46 - [15:0:0:0] disk ATA DELLBOSS VD 00-0 /dev/sdk [17:0:0:0] process Marvell Console 1.01 - [18:0:0:0] cd/dvd Linux Virtual CD 0399 /dev/sr0 [18:0:0:1] disk Linux Virtual Floppy 0399 /dev/sdl ]
若為 5300 系列,以下是機型和韌體修訂版的外觀:
[0:0:0:0] disk ATA MTFDDAK3T8TDT J404 /dev/sda [0:0:1:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdb [0:0:2:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdc [0:0:3:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdd [0:0:4:0] disk ATA MTFDDAK3T8TDT J404 /dev/sde [0:0:5:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdf [0:0:6:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdg [0:0:7:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdh [0:0:8:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdi [0:0:9:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdj [0:0:10:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdk [0:0:11:0] disk ATA MTFDDAK3T8TDT J404 /dev/sdl
影響:系統可能會進入降級或 DU 狀態。
原因
類型為「MTFDDAK」的微米固態硬碟存在韌體瑕疵,導致裝置回應緩慢。
解析度
因應措施
將有問題的 SSD 更換為 支援矩陣
中所列的裝置 目前尚無可解決此問題的更新韌體 (截至 2022 年 6 月 15 日)。
受影響的版本
這不是 PowerFlex 問題 - 所有 PowerFlex 版本都可能會受到影響。
已修正問題的版本
不是 PowerFlex 的問題
受影響的產品
PowerFlex Software, VxFlex Product Family, VxFlex Ready Node, ScaleIO Ready Node-PowerEdge 13G, Ready Node Series文章屬性
文章編號: 000203252
文章類型: Solution
上次修改時間: 16 1月 2026
版本: 6
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。