Unsolved

4 Posts

3047

January 7th, 2022 03:00

H755N kernel module hangs

Dear all,

we have some issues with the H755N RAID controller inside a PowerEdge R750. The firmware is at version 52.16.1-4158, the driver at version 07.719.04.00. We use Debian 11 with a 5.10 kernel. I updated the included kernel module 07.714.04.00-rc1 to the most recent version 07.719.04.00, which I found here: https://www.broadcom.com/products/storage/raid-controllers/megaraid-sas-9361-8i

During most controller interaction (when the OS is requesting S.M.A.R.T. information or during shutdown) the kernel module becomes unresponsive. The sample output below shows the hangup during a restart of the smartd service.

Any suggestions how this error could be resolved? Thank you!

 

 

 10:21:47 host systemd[1]: Stopping Self Monitoring and Reporting Technology (SMART) Daemon...
 10:21:47 host smartd[1487]: smartd received signal 15: Terminated
 10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_00], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
 10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_01], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
 10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_02], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
 10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
 10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
 10:21:47 host smartd[1487]: smartd is exiting (exit status 0)
 10:21:47 host systemd[1]: smartmontools.service: Succeeded.
 10:21:47 host systemd[1]: Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
 10:21:47 host systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
 10:21:47 host smartd[2756]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-10-amd64] (local build)
 10:21:47 host smartd[2756]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
 10:21:47 host smartd[2756]: Opened configuration file /etc/smartd.conf
 10:21:47 host smartd[2756]: Configuration file /etc/smartd.conf parsed.
 10:22:29 host kernel: INFO: task megacli.real:2543 blocked for more than 120 seconds.
 10:22:29 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:22:29 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:22:29 host kernel: task:megacli.real    state:D stack:    0 pid: 2543 ppid:  2542 flags:0x00000000
 10:22:29 host kernel: Call Trace:
 10:22:29 host kernel:  __schedule+0x282/0x870
 10:22:29 host kernel:  schedule+0x46/0xb0
 10:22:29 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:22:29 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:22:29 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:22:29 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:22:29 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:22:29 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:22:29 host kernel:  do_syscall_64+0x33/0x80
 10:22:29 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:22:29 host kernel: RIP: 0033:0x7f2c080f7cc7
 10:22:29 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:22:29 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
 10:22:29 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:22:29 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
 10:22:29 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
 10:22:29 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 10:23:17 host systemd[1]: smartmontools.service: start operation timed out. Terminating.
 10:24:30 host kernel: INFO: task megacli.real:2543 blocked for more than 241 seconds.
 10:24:30 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:24:30 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:24:30 host kernel: task:megacli.real    state:D stack:    0 pid: 2543 ppid:  2542 flags:0x00000000
 10:24:30 host kernel: Call Trace:
 10:24:30 host kernel:  __schedule+0x282/0x870
 10:24:30 host kernel:  schedule+0x46/0xb0
 10:24:30 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:24:30 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:24:30 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:24:30 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:24:30 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:24:30 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:24:30 host kernel:  do_syscall_64+0x33/0x80
 10:24:30 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:24:30 host kernel: RIP: 0033:0x7f2c080f7cc7
 10:24:30 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:24:30 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
 10:24:30 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:24:30 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
 10:24:30 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
 10:24:30 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 10:24:30 host kernel: INFO: task smartd:2756 blocked for more than 120 seconds.
 10:24:30 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:24:30 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:24:30 host kernel: task:smartd          state:D stack:    0 pid: 2756 ppid:     1 flags:0x00000004
 10:24:30 host kernel: Call Trace:
 10:24:30 host kernel:  __schedule+0x282/0x870
 10:24:30 host kernel:  schedule+0x46/0xb0
 10:24:30 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:24:30 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:24:30 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:24:30 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:24:30 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:24:30 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:24:30 host kernel:  do_syscall_64+0x33/0x80
 10:24:30 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:24:30 host kernel: RIP: 0033:0x7f7c43bb0cc7
 10:24:30 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:24:30 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
 10:24:30 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:24:30 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
 10:24:30 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
 10:24:30 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
 10:24:47 host systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
 10:24:47 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGABRT.
 10:24:47 host systemd[1]: smartmontools.service: State 'stop-sigterm' timed out. Killing.
 10:24:47 host systemd[1]: smartmontools.service: Killing process 2756 (smartd) with signal SIGKILL.
 10:24:52 host kernel: sd 0:3:111:0: tag#4160 CDB: Test Unit Ready 00 00 00 00 00 00
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 OCR is requested due to IO timeout!!
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 SCSI host state: 5  FW outstanding: 1
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 scmd: (0x0000000048c5788a)  retries: 0x0  allowed: 0x5
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 CDB: Test Unit Ready 00 00 00 00 00 00
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 Request descriptor details:
 10:24:53 host kernel: sd 0:3:111:0: tag#4160 RequestFlags:0x0  MSIxIndex:0x0  SMID:0x1041  LMID:0x0  DevHandle:0x0
 10:24:53 host kernel: IO request frame:
 10:24:53 host kernel: 00000000: f10000ef 00000000 00000000 ab4f8800 00600002 00000020 00000000 00000000
 10:24:53 host kernel: 00000020: 00000000 00000006 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 00000040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 00000060: 001e0000 00ef0000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 00000080: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 000000a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 000000c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 000000e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: Chain frame:
 10:24:53 host kernel: Chain frame:
 10:24:53 host kernel: 00000000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 00000020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 10:24:53 host kernel: 00000040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 >> further "00000000" blocks omitted due to length restriction
 10:24:53 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
 10:24:53 host kernel: megaraid_sas 0000:65:00.0: [ 0]waiting for 1 commands to complete for scsi0
 10:24:58 host kernel: megaraid_sas 0000:65:00.0: [ 5]waiting for 1 commands to complete for scsi0
 10:25:01 host CRON[2777]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
 10:25:03 host kernel: megaraid_sas 0000:65:00.0: [10]waiting for 1 commands to complete for scsi0
 10:25:08 host kernel: megaraid_sas 0000:65:00.0: [15]waiting for 1 commands to complete for scsi0
 10:25:13 host kernel: megaraid_sas 0000:65:00.0: [20]waiting for 1 commands to complete for scsi0
 10:25:18 host kernel: megaraid_sas 0000:65:00.0: [25]waiting for 1 commands to complete for scsi0
 10:25:24 host kernel: megaraid_sas 0000:65:00.0: [30]waiting for 1 commands to complete for scsi0
 10:25:29 host kernel: megaraid_sas 0000:65:00.0: [35]waiting for 1 commands to complete for scsi0
 10:25:34 host kernel: megaraid_sas 0000:65:00.0: [40]waiting for 1 commands to complete for scsi0
 10:25:39 host kernel: megaraid_sas 0000:65:00.0: [45]waiting for 1 commands to complete for scsi0
 10:25:44 host kernel: megaraid_sas 0000:65:00.0: [50]waiting for 1 commands to complete for scsi0
 10:25:49 host kernel: megaraid_sas 0000:65:00.0: [55]waiting for 1 commands to complete for scsi0
 10:25:54 host kernel: megaraid_sas 0000:65:00.0: [60]waiting for 1 commands to complete for scsi0
 10:25:59 host kernel: megaraid_sas 0000:65:00.0: [65]waiting for 1 commands to complete for scsi0
 10:26:05 host kernel: megaraid_sas 0000:65:00.0: [70]waiting for 1 commands to complete for scsi0
 10:26:10 host kernel: megaraid_sas 0000:65:00.0: [75]waiting for 1 commands to complete for scsi0
 10:26:15 host kernel: megaraid_sas 0000:65:00.0: [80]waiting for 1 commands to complete for scsi0
 10:26:17 host systemd[1]: systemd-udevd.service: State 'stop-watchdog' timed out. Killing.
 10:26:17 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGKILL.
 10:26:17 host systemd[1]: smartmontools.service: Processes still around after SIGKILL. Ignoring.
 10:26:20 host kernel: megaraid_sas 0000:65:00.0: [85]waiting for 1 commands to complete for scsi0
 10:26:25 host kernel: megaraid_sas 0000:65:00.0: [90]waiting for 1 commands to complete for scsi0
 10:26:30 host kernel: megaraid_sas 0000:65:00.0: [95]waiting for 1 commands to complete for scsi0
 10:26:31 host kernel: INFO: task megacli.real:2543 blocked for more than 362 seconds.
 10:26:31 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:26:31 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:26:31 host kernel: task:megacli.real    state:D stack:    0 pid: 2543 ppid:  2542 flags:0x00000000
 10:26:31 host kernel: Call Trace:
 10:26:31 host kernel:  __schedule+0x282/0x870
 10:26:31 host kernel:  schedule+0x46/0xb0
 10:26:31 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:26:31 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:26:31 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:26:31 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:26:31 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:26:31 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:26:31 host kernel:  do_syscall_64+0x33/0x80
 10:26:31 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:26:31 host kernel: RIP: 0033:0x7f2c080f7cc7
 10:26:31 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:26:31 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
 10:26:31 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:26:31 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
 10:26:31 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
 10:26:31 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 10:26:31 host kernel: INFO: task smartd:2756 blocked for more than 241 seconds.
 10:26:31 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:26:31 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:26:31 host kernel: task:smartd          state:D stack:    0 pid: 2756 ppid:     1 flags:0x00000004
 10:26:31 host kernel: Call Trace:
 10:26:31 host kernel:  __schedule+0x282/0x870
 10:26:31 host kernel:  schedule+0x46/0xb0
 10:26:31 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:26:31 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:26:31 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:26:31 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:26:31 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:26:31 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:26:31 host kernel:  do_syscall_64+0x33/0x80
 10:26:31 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:26:31 host kernel: RIP: 0033:0x7f7c43bb0cc7
 10:26:31 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:26:31 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
 10:26:31 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:26:31 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
 10:26:31 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
 10:26:31 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
 10:26:35 host kernel: megaraid_sas 0000:65:00.0: [100]waiting for 1 commands to complete for scsi0
 10:26:40 host kernel: megaraid_sas 0000:65:00.0: [105]waiting for 1 commands to complete for scsi0
 10:26:45 host kernel: megaraid_sas 0000:65:00.0: [110]waiting for 1 commands to complete for scsi0
 10:26:51 host kernel: megaraid_sas 0000:65:00.0: [115]waiting for 1 commands to complete for scsi0
 10:26:56 host kernel: megaraid_sas 0000:65:00.0: [120]waiting for 1 commands to complete for scsi0
 10:27:01 host kernel: megaraid_sas 0000:65:00.0: [125]waiting for 1 commands to complete for scsi0
 10:27:06 host kernel: megaraid_sas 0000:65:00.0: [130]waiting for 1 commands to complete for scsi0
 10:27:11 host kernel: megaraid_sas 0000:65:00.0: [135]waiting for 1 commands to complete for scsi0
 10:27:16 host kernel: megaraid_sas 0000:65:00.0: [140]waiting for 1 commands to complete for scsi0
 10:27:21 host kernel: megaraid_sas 0000:65:00.0: [145]waiting for 1 commands to complete for scsi0
 10:27:26 host kernel: megaraid_sas 0000:65:00.0: [150]waiting for 1 commands to complete for scsi0
 10:27:32 host kernel: megaraid_sas 0000:65:00.0: [155]waiting for 1 commands to complete for scsi0
 10:27:37 host kernel: megaraid_sas 0000:65:00.0: [160]waiting for 1 commands to complete for scsi0
 10:27:42 host kernel: megaraid_sas 0000:65:00.0: Trigger snap dump
 10:27:48 host systemd[1]: systemd-udevd.service: Processes still around after SIGKILL. Ignoring.
 10:27:48 host systemd[1]: smartmontools.service: State 'final-sigterm' timed out. Killing.
 10:27:48 host systemd[1]: smartmontools.service: Killing process 2756 (smartd) with signal SIGKILL.
 10:27:57 host kernel: megaraid_sas 0000:65:00.0: resetting fusion adapter scsi0.
 10:27:57 host kernel: megaraid_sas 0000:65:00.0: Outstanding fastpath IOs: 0
 10:28:07 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
 10:28:23 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
 10:28:23 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101  LDIO threshold: 0
 10:28:23 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
 10:28:23 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache   : Yes
 10:28:23 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor    : Yes
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1  max_lds: 240
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: controller type  : MR(8192MB)
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR)     : Enabled
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support      : No
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support    : Yes
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout   : 6 secs/60 secs
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support       : Yes
 10:28:24 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support        : Yes
 10:28:32 host kernel: INFO: task megacli.real:2543 blocked for more than 483 seconds.
 10:28:32 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:28:32 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:28:32 host kernel: task:megacli.real    state:D stack:    0 pid: 2543 ppid:  2542 flags:0x00000000
 10:28:32 host kernel: Call Trace:
 10:28:32 host kernel:  __schedule+0x282/0x870
 10:28:32 host kernel:  schedule+0x46/0xb0
 10:28:32 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:28:32 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:28:32 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:28:32 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:28:32 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:28:32 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:28:32 host kernel:  do_syscall_64+0x33/0x80
 10:28:32 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:28:32 host kernel: RIP: 0033:0x7f2c080f7cc7
 10:28:32 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:28:32 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
 10:28:32 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:28:32 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
 10:28:32 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
 10:28:32 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 10:28:32 host kernel: INFO: task smartd:2756 blocked for more than 362 seconds.
 10:28:32 host kernel:       Tainted: G           OE     5.10.0-10-amd64 #1 Debian 5.10.84-1
 10:28:32 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 10:28:32 host kernel: task:smartd          state:D stack:    0 pid: 2756 ppid:     1 flags:0x00000004
 10:28:32 host kernel: Call Trace:
 10:28:32 host kernel:  __schedule+0x282/0x870
 10:28:32 host kernel:  schedule+0x46/0xb0
 10:28:32 host kernel:  megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
 10:28:32 host kernel:  ? add_wait_queue_exclusive+0x70/0x70
 10:28:32 host kernel:  megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
 10:28:32 host kernel:  megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
 10:28:32 host kernel:  megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
 10:28:32 host kernel:  __x64_sys_ioctl+0x83/0xb0
 10:28:32 host kernel:  do_syscall_64+0x33/0x80
 10:28:32 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 10:28:32 host kernel: RIP: 0033:0x7f7c43bb0cc7
 10:28:32 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 10:28:32 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
 10:28:32 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
 10:28:32 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
 10:28:32 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
 10:28:32 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
 10:28:46 host kernel: megaraid_sas 0000:65:00.0: Iop2SysDoorbellInt for scsi0
 10:28:52 host kernel: megaraid_sas 0000:65:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
 10:29:02 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
 10:29:16 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
 10:29:16 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101  LDIO threshold: 0
 10:29:16 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
 10:29:16 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache   : Yes
 10:29:16 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor    : Yes
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1  max_lds: 240
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: controller type  : MR(8192MB)
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR)     : Enabled
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support      : No
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support    : Yes
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout   : 6 secs/60 secs
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support       : Yes
 10:29:17 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support        : Yes
 10:29:18 host systemd[1]: systemd-udevd.service: State 'final-sigterm' timed out. Killing.
 10:29:18 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGKILL.
 10:29:18 host systemd[1]: smartmontools.service: Processes still around after final SIGKILL. Entering failed mode.
 10:29:18 host systemd[1]: smartmontools.service: Failed with result 'timeout'.
 10:29:18 host systemd[1]: smartmontools.service: Unit process 2756 (smartd) remains running after unit stopped.
 10:29:18 host systemd[1]: Failed to start Self Monitoring and Reporting Technology (SMART) Daemon.
 10:29:45 host kernel: megaraid_sas 0000:65:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
 10:29:55 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
 10:30:00 host systemd[1]: Starting system activity accounting tool...
 10:30:00 host systemd[1]: sysstat-collect.service: Succeeded.
 10:30:00 host systemd[1]: Finished system activity accounting tool.
 10:30:10 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
 10:30:10 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101  LDIO threshold: 0
 10:30:10 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
 10:30:10 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache   : Yes
 10:30:10 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor    : Yes
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1  max_lds: 240
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: controller type  : MR(8192MB)
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR)     : Enabled
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support      : No
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support    : Yes
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout   : 6 secs/60 secs
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support       : Yes
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support        : Yes
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4516 cmd 0x5 opcode 0x10b0100
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4516 cmd 0x4 opcode 0x0
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8889 cmd 0x5 opcode 0x10b0100 cmd->cmd_status_drv 0x3
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8889 cmd 0x4 opcode 0x0 cmd->cmd_status_drv 0x3
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: waiting for controller reset to finish
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: Adapter is OPERATIONAL for scsi:0
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: Snap dump wait time      : 15
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: Reset successful for scsi0.
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14893 (694866487s/0x0020/CRIT) - Controller encountered an error and was reset
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14923 (694866525s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c

 10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14926 (694866535s/0x0020/CRIT) - Controller encountered an error and was reset
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14956 (694866572s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c

 10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14959 (694866582s/0x0020/CRIT) - Controller encountered an error and was reset
 10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
 10:30:21 host systemd[1]: systemd-udevd.service: Main process exited, code=killed, status=9/KILL
 10:30:21 host systemd[1]: systemd-udevd.service: Failed with result 'watchdog'.
 10:30:21 host systemd[1]: systemd-udevd.service: Consumed 42.066s CPU time.
 10:30:21 host systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 1.
 10:30:21 host systemd[1]: Stopped Rule-based Manager for Device Events and Files.
 10:30:21 host systemd[1]: systemd-udevd.service: Consumed 42.066s CPU time.
 10:30:21 host systemd[1]: Starting Rule-based Manager for Device Events and Files...
 10:30:21 host systemd[1]: Started Rule-based Manager for Device Events and Files.
 10:30:59 host systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
 10:30:59 host smartd[2856]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-10-amd64] (local build)
 10:30:59 host smartd[2856]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
 10:30:59 host smartd[2856]: Opened configuration file /etc/smartd.conf
 10:30:59 host smartd[2856]: Configuration file /etc/smartd.conf parsed.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], opened
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], [NVMe     Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a034100025384100000002, S/N: S60, 3.84 TB
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], is SMART capable. Adding to "monitor" list.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], opened
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], [NVMe     Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a028580025384100000002, S/N: S61, 3.84 TB
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], is SMART capable. Adding to "monitor" list.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], opened
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], [NVMe     Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a042550025384100000002, S/N: S62, 3.84 TB
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], is SMART capable. Adding to "monitor" list.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], opened
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], [NVMe     Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a028600025384100000002, S/N: S63, 3.84 TB
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], is SMART capable. Adding to "monitor" list.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], opened
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], [NVMe     Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a043550025384100000002, S/N: S64, 3.84 TB
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], is SMART capable. Adding to "monitor" list.
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
 10:30:59 host smartd[2856]: Monitoring 0 ATA/SATA, 5 SCSI/SAS and 0 NVMe devices
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
 10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
 10:30:59 host systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.

 

 

 

 

4 Apprentice

 • 

278 Posts

January 7th, 2022 07:00

Hello codebold,

 

Thank you for choosing Dell. I am sorry you faced with this issue. Do I understand correctly that the problem started after the firmware update?

 

It would be great if you could send me logs from iDRAC for checking. Could you please gather and send them to me in Private Message?

 

How to gather logs:

https://dell.to/3eY5qbV

 

Please ask me if you have any questions,

4 Posts

January 7th, 2022 07:00

Hello Maria,

thanks for your help! The problem already existed before the firmware upgrade. I will gather the logs and send them to you in a private message.

1 Message

June 8th, 2022 05:00

Hi,

we have the same problem. was there a solution?

 

Best,

Hp

7 Practitioner

 • 

9.7K Posts

 • 

48K Points

June 8th, 2022 05:00

Rageth,

 

Would you confirm if the server is configured for UEFI or BIOS? 

 

Let me know and we can go from there.

 

No Events found!

Top