Dell Unity: Both SPA and SPB panic at same time due to write request pool size

摘要: Both SPA and SPB panic at same time due to write request pool size

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

症狀

User expanded DAE which leads to I/O burst, both SPA and SPB panicked at same the time which resulted in data unavailability.

SPA Panicked two times:      
Fri Mar 19 05:57:34 UTC 2021 system-state: set sp-critical-error
Fri Mar 19 06:38:35 UTC 2021 system-state: set sp-critical-error

SPB Panicked three times:     
Fri Mar 19 06:13:43 UTC 2021 system-state: set sp-critical-error
Fri Mar 19 06:39:35 UTC 2021 system-state: set sp-critical-error
Fri Mar 19 06:51:05 UTC 2021 system-state: set sp-critical-error
 

/spa/EMC/C4Core/log> zgrep -A1 0x81254002 c4_safe_ktrace*
c4_safe_ktrace.log.10.gz:2021/03/19-06:50:11.935709 11 7FA30677870B std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.10.gz:2021/03/19-06:50:11.935710 ~~~~ 7FA30677870B std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
c4_safe_ktrace.log.15.gz:2021/03/19-06:38:29.041315 ~~~~ 7FA9A02CF709 std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.15.gz:2021/03/19-06:38:29.041316 ~~~~ 7FA9A0FF4701 std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
c4_safe_ktrace.log.23.gz:2021/03/19-05:57:33.014928 101K 7F4A6A827706 std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.23.gz:2021/03/19-05:57:33.014932 0 7F4A6A827706 std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
c4_safe_ktrace.log.7.gz:2021/03/19-06:59:52.097333 272 7FC21798570C std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.7.gz:2021/03/19-06:59:52.097334 ~~~~ 7FC21798570C std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
/spb/EMC/C4Core/log> zgrep -A1 0x81254002 c4_safe_ktrace*
c4_safe_ktrace.log.11.gz:2021/03/19-06:50:59.806541 3880 7FBDE30B570E std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.11.gz:2021/03/19-06:50:59.806545 ~~~~ 7FBDE30B570E std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
c4_safe_ktrace.log.21.gz:2021/03/19-06:39:29.781207 13 7F1582EEF70D std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.21.gz:2021/03/19-06:39:29.781208 ~~~~ 7F1582EEF70D std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)
c4_safe_ktrace.log.24.gz:2021/03/19-06:10:44.513508 1105 7FDFD279970C std:RMD: Allocating write rqst failed! Status is 0x81254002
c4_safe_ktrace.log.24.gz:2021/03/19-06:10:44.513511 ~~~~ 7FDFD279970C std:Cmipci1: Notify peer that we are going down NOW. We can't wait!! (0x1)

SPA:
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 139957593749248 aka 139957593749248) [PID:30127 TID:24677 CORE:11 [csx_ic_std.x] [asyncFlush185] [03/19/2021 05:57:30 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 140366523385600 aka 140366523385600) [PID:29836 TID:31282 CORE:7 [csx_ic_std.x] [asyncFlush116] [03/19/2021 06:38:30 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 140338173945600 aka 140338173945600) [PID:29848 TID:7080 CORE:3 [csx_ic_std.x] [asyncFlush46] [03/19/2021 06:50:11 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 140471598331648 aka 140471598331648) [PID:29945 TID:25809 CORE:7 [csx_ic_std.x] [asyncFlush100] [03/19/2021 06:59:52 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]

SPB:
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 140599284365056 aka 140599284365056) [PID:29864 TID:25385 CORE:13 [csx_ic_std.x] [asyncFlush214] [03/19/2021 06:09:18 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]/WriteRequestPoolSize
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 139730383755008 aka 139730383755008) [PID:30143 TID:25017 CORE:8 [csx_ic_std.x] [asyncFlush65] [03/19/2021 06:39:31 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [info:0]
c4_safe_native.log:CSX RT: panic requested at: KLogBugCheck.c:57 (thread: 140453508536064 aka 140453508536064) [PID:29844 TID:27770 CORE:9 [csx_ic_std.x] [asyncFlush202] [03/19/2021 06:51:01 UTC]] (panic action:DEFAULT expr:<no-expr> flags:-) [in
fo:0]

原因

All panics are the same due to an issue reported. The panic was in RMD which cannot allocate write request from the global write request pool, which is caused by burst of IOPS.

解析度

Workaround:      
Follow the below plan to increase the write request global pool size to avoid the panic. This occupies additional memory for each SP.

To increase the write request global pool size to 32768 (as an example, or any other value), follow below steps on both SPs:      

  1. Check current value:       

reg_tool get /SYSTEM/CurrentControlSet/Services/RemoteMirroring/Parameters/WriteRequestPoolSize

  1. Set value to 32768 (as an example)

reg_tool set /SYSTEM/CurrentControlSet/Services/RemoteMirroring/Parameters/WriteRequestPoolSize=REG_DWORD@0x00008000

  1. Reboot both the SPs one by one.

(Pay attention that the parameter is set to default after NDU. User needs to reset the parameter after NDU until the OE can be upgraded.)


Fix will be available in next major Unity OE code release. EE plans on increasing RMD write request pool size in the code to avoid the panic.

產品

Dell EMC Unity Family
文章屬性
文章編號: 000184814
文章類型: Solution
上次修改時間: 22 1月 2026
版本:  2
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。