Unsolved
4 Posts
0
3047
January 7th, 2022 03:00
H755N kernel module hangs
Dear all,
we have some issues with the H755N RAID controller inside a PowerEdge R750. The firmware is at version 52.16.1-4158, the driver at version 07.719.04.00. We use Debian 11 with a 5.10 kernel. I updated the included kernel module 07.714.04.00-rc1 to the most recent version 07.719.04.00, which I found here: https://www.broadcom.com/products/storage/raid-controllers/megaraid-sas-9361-8i
During most controller interaction (when the OS is requesting S.M.A.R.T. information or during shutdown) the kernel module becomes unresponsive. The sample output below shows the hangup during a restart of the smartd service.
Any suggestions how this error could be resolved? Thank you!
10:21:47 host systemd[1]: Stopping Self Monitoring and Reporting Technology (SMART) Daemon...
10:21:47 host smartd[1487]: smartd received signal 15: Terminated
10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_00], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_01], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_02], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
10:21:47 host smartd[1487]: Device: /dev/bus/0 [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
10:21:47 host smartd[1487]: smartd is exiting (exit status 0)
10:21:47 host systemd[1]: smartmontools.service: Succeeded.
10:21:47 host systemd[1]: Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
10:21:47 host systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
10:21:47 host smartd[2756]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-10-amd64] (local build)
10:21:47 host smartd[2756]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
10:21:47 host smartd[2756]: Opened configuration file /etc/smartd.conf
10:21:47 host smartd[2756]: Configuration file /etc/smartd.conf parsed.
10:22:29 host kernel: INFO: task megacli.real:2543 blocked for more than 120 seconds.
10:22:29 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:22:29 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:22:29 host kernel: task:megacli.real state:D stack: 0 pid: 2543 ppid: 2542 flags:0x00000000
10:22:29 host kernel: Call Trace:
10:22:29 host kernel: __schedule+0x282/0x870
10:22:29 host kernel: schedule+0x46/0xb0
10:22:29 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:22:29 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:22:29 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:22:29 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:22:29 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:22:29 host kernel: __x64_sys_ioctl+0x83/0xb0
10:22:29 host kernel: do_syscall_64+0x33/0x80
10:22:29 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:22:29 host kernel: RIP: 0033:0x7f2c080f7cc7
10:22:29 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:22:29 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
10:22:29 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:22:29 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
10:22:29 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
10:22:29 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
10:23:17 host systemd[1]: smartmontools.service: start operation timed out. Terminating.
10:24:30 host kernel: INFO: task megacli.real:2543 blocked for more than 241 seconds.
10:24:30 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:24:30 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:24:30 host kernel: task:megacli.real state:D stack: 0 pid: 2543 ppid: 2542 flags:0x00000000
10:24:30 host kernel: Call Trace:
10:24:30 host kernel: __schedule+0x282/0x870
10:24:30 host kernel: schedule+0x46/0xb0
10:24:30 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:24:30 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:24:30 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:24:30 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:24:30 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:24:30 host kernel: __x64_sys_ioctl+0x83/0xb0
10:24:30 host kernel: do_syscall_64+0x33/0x80
10:24:30 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:24:30 host kernel: RIP: 0033:0x7f2c080f7cc7
10:24:30 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:24:30 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
10:24:30 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:24:30 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
10:24:30 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
10:24:30 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
10:24:30 host kernel: INFO: task smartd:2756 blocked for more than 120 seconds.
10:24:30 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:24:30 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:24:30 host kernel: task:smartd state:D stack: 0 pid: 2756 ppid: 1 flags:0x00000004
10:24:30 host kernel: Call Trace:
10:24:30 host kernel: __schedule+0x282/0x870
10:24:30 host kernel: schedule+0x46/0xb0
10:24:30 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:24:30 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:24:30 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:24:30 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:24:30 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:24:30 host kernel: __x64_sys_ioctl+0x83/0xb0
10:24:30 host kernel: do_syscall_64+0x33/0x80
10:24:30 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:24:30 host kernel: RIP: 0033:0x7f7c43bb0cc7
10:24:30 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:24:30 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
10:24:30 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:24:30 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
10:24:30 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
10:24:30 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
10:24:47 host systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
10:24:47 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGABRT.
10:24:47 host systemd[1]: smartmontools.service: State 'stop-sigterm' timed out. Killing.
10:24:47 host systemd[1]: smartmontools.service: Killing process 2756 (smartd) with signal SIGKILL.
10:24:52 host kernel: sd 0:3:111:0: tag#4160 CDB: Test Unit Ready 00 00 00 00 00 00
10:24:53 host kernel: sd 0:3:111:0: tag#4160 OCR is requested due to IO timeout!!
10:24:53 host kernel: sd 0:3:111:0: tag#4160 SCSI host state: 5 FW outstanding: 1
10:24:53 host kernel: sd 0:3:111:0: tag#4160 scmd: (0x0000000048c5788a) retries: 0x0 allowed: 0x5
10:24:53 host kernel: sd 0:3:111:0: tag#4160 CDB: Test Unit Ready 00 00 00 00 00 00
10:24:53 host kernel: sd 0:3:111:0: tag#4160 Request descriptor details:
10:24:53 host kernel: sd 0:3:111:0: tag#4160 RequestFlags:0x0 MSIxIndex:0x0 SMID:0x1041 LMID:0x0 DevHandle:0x0
10:24:53 host kernel: IO request frame:
10:24:53 host kernel: 00000000: f10000ef 00000000 00000000 ab4f8800 00600002 00000020 00000000 00000000
10:24:53 host kernel: 00000020: 00000000 00000006 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 00000040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 00000060: 001e0000 00ef0000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 00000080: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 000000a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 000000c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 000000e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: Chain frame:
10:24:53 host kernel: Chain frame:
10:24:53 host kernel: 00000000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 00000020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
10:24:53 host kernel: 00000040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> further "00000000" blocks omitted due to length restriction
10:24:53 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
10:24:53 host kernel: megaraid_sas 0000:65:00.0: [ 0]waiting for 1 commands to complete for scsi0
10:24:58 host kernel: megaraid_sas 0000:65:00.0: [ 5]waiting for 1 commands to complete for scsi0
10:25:01 host CRON[2777]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
10:25:03 host kernel: megaraid_sas 0000:65:00.0: [10]waiting for 1 commands to complete for scsi0
10:25:08 host kernel: megaraid_sas 0000:65:00.0: [15]waiting for 1 commands to complete for scsi0
10:25:13 host kernel: megaraid_sas 0000:65:00.0: [20]waiting for 1 commands to complete for scsi0
10:25:18 host kernel: megaraid_sas 0000:65:00.0: [25]waiting for 1 commands to complete for scsi0
10:25:24 host kernel: megaraid_sas 0000:65:00.0: [30]waiting for 1 commands to complete for scsi0
10:25:29 host kernel: megaraid_sas 0000:65:00.0: [35]waiting for 1 commands to complete for scsi0
10:25:34 host kernel: megaraid_sas 0000:65:00.0: [40]waiting for 1 commands to complete for scsi0
10:25:39 host kernel: megaraid_sas 0000:65:00.0: [45]waiting for 1 commands to complete for scsi0
10:25:44 host kernel: megaraid_sas 0000:65:00.0: [50]waiting for 1 commands to complete for scsi0
10:25:49 host kernel: megaraid_sas 0000:65:00.0: [55]waiting for 1 commands to complete for scsi0
10:25:54 host kernel: megaraid_sas 0000:65:00.0: [60]waiting for 1 commands to complete for scsi0
10:25:59 host kernel: megaraid_sas 0000:65:00.0: [65]waiting for 1 commands to complete for scsi0
10:26:05 host kernel: megaraid_sas 0000:65:00.0: [70]waiting for 1 commands to complete for scsi0
10:26:10 host kernel: megaraid_sas 0000:65:00.0: [75]waiting for 1 commands to complete for scsi0
10:26:15 host kernel: megaraid_sas 0000:65:00.0: [80]waiting for 1 commands to complete for scsi0
10:26:17 host systemd[1]: systemd-udevd.service: State 'stop-watchdog' timed out. Killing.
10:26:17 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGKILL.
10:26:17 host systemd[1]: smartmontools.service: Processes still around after SIGKILL. Ignoring.
10:26:20 host kernel: megaraid_sas 0000:65:00.0: [85]waiting for 1 commands to complete for scsi0
10:26:25 host kernel: megaraid_sas 0000:65:00.0: [90]waiting for 1 commands to complete for scsi0
10:26:30 host kernel: megaraid_sas 0000:65:00.0: [95]waiting for 1 commands to complete for scsi0
10:26:31 host kernel: INFO: task megacli.real:2543 blocked for more than 362 seconds.
10:26:31 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:26:31 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:26:31 host kernel: task:megacli.real state:D stack: 0 pid: 2543 ppid: 2542 flags:0x00000000
10:26:31 host kernel: Call Trace:
10:26:31 host kernel: __schedule+0x282/0x870
10:26:31 host kernel: schedule+0x46/0xb0
10:26:31 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:26:31 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:26:31 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:26:31 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:26:31 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:26:31 host kernel: __x64_sys_ioctl+0x83/0xb0
10:26:31 host kernel: do_syscall_64+0x33/0x80
10:26:31 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:26:31 host kernel: RIP: 0033:0x7f2c080f7cc7
10:26:31 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:26:31 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
10:26:31 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:26:31 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
10:26:31 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
10:26:31 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
10:26:31 host kernel: INFO: task smartd:2756 blocked for more than 241 seconds.
10:26:31 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:26:31 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:26:31 host kernel: task:smartd state:D stack: 0 pid: 2756 ppid: 1 flags:0x00000004
10:26:31 host kernel: Call Trace:
10:26:31 host kernel: __schedule+0x282/0x870
10:26:31 host kernel: schedule+0x46/0xb0
10:26:31 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:26:31 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:26:31 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:26:31 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:26:31 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:26:31 host kernel: __x64_sys_ioctl+0x83/0xb0
10:26:31 host kernel: do_syscall_64+0x33/0x80
10:26:31 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:26:31 host kernel: RIP: 0033:0x7f7c43bb0cc7
10:26:31 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:26:31 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
10:26:31 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:26:31 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
10:26:31 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
10:26:31 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
10:26:35 host kernel: megaraid_sas 0000:65:00.0: [100]waiting for 1 commands to complete for scsi0
10:26:40 host kernel: megaraid_sas 0000:65:00.0: [105]waiting for 1 commands to complete for scsi0
10:26:45 host kernel: megaraid_sas 0000:65:00.0: [110]waiting for 1 commands to complete for scsi0
10:26:51 host kernel: megaraid_sas 0000:65:00.0: [115]waiting for 1 commands to complete for scsi0
10:26:56 host kernel: megaraid_sas 0000:65:00.0: [120]waiting for 1 commands to complete for scsi0
10:27:01 host kernel: megaraid_sas 0000:65:00.0: [125]waiting for 1 commands to complete for scsi0
10:27:06 host kernel: megaraid_sas 0000:65:00.0: [130]waiting for 1 commands to complete for scsi0
10:27:11 host kernel: megaraid_sas 0000:65:00.0: [135]waiting for 1 commands to complete for scsi0
10:27:16 host kernel: megaraid_sas 0000:65:00.0: [140]waiting for 1 commands to complete for scsi0
10:27:21 host kernel: megaraid_sas 0000:65:00.0: [145]waiting for 1 commands to complete for scsi0
10:27:26 host kernel: megaraid_sas 0000:65:00.0: [150]waiting for 1 commands to complete for scsi0
10:27:32 host kernel: megaraid_sas 0000:65:00.0: [155]waiting for 1 commands to complete for scsi0
10:27:37 host kernel: megaraid_sas 0000:65:00.0: [160]waiting for 1 commands to complete for scsi0
10:27:42 host kernel: megaraid_sas 0000:65:00.0: Trigger snap dump
10:27:48 host systemd[1]: systemd-udevd.service: Processes still around after SIGKILL. Ignoring.
10:27:48 host systemd[1]: smartmontools.service: State 'final-sigterm' timed out. Killing.
10:27:48 host systemd[1]: smartmontools.service: Killing process 2756 (smartd) with signal SIGKILL.
10:27:57 host kernel: megaraid_sas 0000:65:00.0: resetting fusion adapter scsi0.
10:27:57 host kernel: megaraid_sas 0000:65:00.0: Outstanding fastpath IOs: 0
10:28:07 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
10:28:23 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
10:28:23 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
10:28:23 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
10:28:23 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache : Yes
10:28:23 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor : Yes
10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
10:28:24 host kernel: megaraid_sas 0000:65:00.0: controller type : MR(8192MB)
10:28:24 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR) : Enabled
10:28:24 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support : No
10:28:24 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support : Yes
10:28:24 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
10:28:24 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support : Yes
10:28:24 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support : Yes
10:28:32 host kernel: INFO: task megacli.real:2543 blocked for more than 483 seconds.
10:28:32 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:28:32 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:28:32 host kernel: task:megacli.real state:D stack: 0 pid: 2543 ppid: 2542 flags:0x00000000
10:28:32 host kernel: Call Trace:
10:28:32 host kernel: __schedule+0x282/0x870
10:28:32 host kernel: schedule+0x46/0xb0
10:28:32 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:28:32 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:28:32 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:28:32 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:28:32 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:28:32 host kernel: __x64_sys_ioctl+0x83/0xb0
10:28:32 host kernel: do_syscall_64+0x33/0x80
10:28:32 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:28:32 host kernel: RIP: 0033:0x7f2c080f7cc7
10:28:32 host kernel: RSP: 002b:00007ffec70b1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:28:32 host kernel: RAX: ffffffffffffffda RBX: 000000000236fa50 RCX: 00007f2c080f7cc7
10:28:32 host kernel: RDX: 000000000236add0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:28:32 host kernel: RBP: 00007ffec70b12c0 R08: 000000000236add0 R09: 00007f2c081c1be0
10:28:32 host kernel: R10: 000000000000006e R11: 0000000000000246 R12: 00000000004028a0
10:28:32 host kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
10:28:32 host kernel: INFO: task smartd:2756 blocked for more than 362 seconds.
10:28:32 host kernel: Tainted: G OE 5.10.0-10-amd64 #1 Debian 5.10.84-1
10:28:32 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:28:32 host kernel: task:smartd state:D stack: 0 pid: 2756 ppid: 1 flags:0x00000004
10:28:32 host kernel: Call Trace:
10:28:32 host kernel: __schedule+0x282/0x870
10:28:32 host kernel: schedule+0x46/0xb0
10:28:32 host kernel: megasas_issue_blocked_cmd+0xc5/0x190 [megaraid_sas]
10:28:32 host kernel: ? add_wait_queue_exclusive+0x70/0x70
10:28:32 host kernel: megasas_mgmt_fw_ioctl+0x2c2/0x6e0 [megaraid_sas]
10:28:32 host kernel: megasas_mgmt_ioctl_fw.constprop.0+0x119/0x170 [megaraid_sas]
10:28:32 host kernel: megasas_mgmt_ioctl+0x24/0x40 [megaraid_sas]
10:28:32 host kernel: __x64_sys_ioctl+0x83/0xb0
10:28:32 host kernel: do_syscall_64+0x33/0x80
10:28:32 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
10:28:32 host kernel: RIP: 0033:0x7f7c43bb0cc7
10:28:32 host kernel: RSP: 002b:00007fff1df361b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:28:32 host kernel: RAX: ffffffffffffffda RBX: 00005575714bfbc0 RCX: 00007f7c43bb0cc7
10:28:32 host kernel: RDX: 00007fff1df361c0 RSI: 00000000c1944d01 RDI: 0000000000000003
10:28:32 host kernel: RBP: 00007f7c43639b48 R08: 0000000000000010 R09: 00007fff1df3657a
10:28:32 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff1df361c0
10:28:32 host kernel: R13: 00007fff1df365d0 R14: 00007fff1df3657a R15: 00007fff1df36520
10:28:46 host kernel: megaraid_sas 0000:65:00.0: Iop2SysDoorbellInt for scsi0
10:28:52 host kernel: megaraid_sas 0000:65:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
10:29:02 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
10:29:16 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
10:29:16 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
10:29:16 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
10:29:16 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache : Yes
10:29:16 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor : Yes
10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
10:29:17 host kernel: megaraid_sas 0000:65:00.0: controller type : MR(8192MB)
10:29:17 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR) : Enabled
10:29:17 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support : No
10:29:17 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support : Yes
10:29:17 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
10:29:17 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support : Yes
10:29:17 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support : Yes
10:29:18 host systemd[1]: systemd-udevd.service: State 'final-sigterm' timed out. Killing.
10:29:18 host systemd[1]: systemd-udevd.service: Killing process 1124 (systemd-udevd) with signal SIGKILL.
10:29:18 host systemd[1]: smartmontools.service: Processes still around after final SIGKILL. Entering failed mode.
10:29:18 host systemd[1]: smartmontools.service: Failed with result 'timeout'.
10:29:18 host systemd[1]: smartmontools.service: Unit process 2756 (smartd) remains running after unit stopped.
10:29:18 host systemd[1]: Failed to start Self Monitoring and Reporting Technology (SMART) Daemon.
10:29:45 host kernel: megaraid_sas 0000:65:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
10:29:55 host kernel: megaraid_sas 0000:65:00.0: Waiting for FW to come to ready state
10:30:00 host systemd[1]: Starting system activity accounting tool...
10:30:00 host systemd[1]: sysstat-collect.service: Succeeded.
10:30:00 host systemd[1]: Finished system activity accounting tool.
10:30:10 host kernel: megaraid_sas 0000:65:00.0: FW now in Ready state
10:30:10 host kernel: megaraid_sas 0000:65:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
10:30:10 host kernel: megaraid_sas 0000:65:00.0: Performance mode :Balanced (latency index = 8)
10:30:10 host kernel: megaraid_sas 0000:65:00.0: FW supports sync cache : Yes
10:30:10 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW supports atomic descriptor : Yes
10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
10:30:11 host kernel: megaraid_sas 0000:65:00.0: controller type : MR(8192MB)
10:30:11 host kernel: megaraid_sas 0000:65:00.0: Online Controller Reset(OCR) : Enabled
10:30:11 host kernel: megaraid_sas 0000:65:00.0: Secure JBOD support : No
10:30:11 host kernel: megaraid_sas 0000:65:00.0: NVMe passthru support : Yes
10:30:11 host kernel: megaraid_sas 0000:65:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
10:30:11 host kernel: megaraid_sas 0000:65:00.0: PCI Lane Margining support : Yes
10:30:11 host kernel: megaraid_sas 0000:65:00.0: JBOD sequence map support : Yes
10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4516 cmd 0x5 opcode 0x10b0100
10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4516 cmd 0x4 opcode 0x0
10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8889 cmd 0x5 opcode 0x10b0100 cmd->cmd_status_drv 0x3
10:30:11 host kernel: megaraid_sas 0000:65:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8889 cmd 0x4 opcode 0x0 cmd->cmd_status_drv 0x3
10:30:11 host kernel: megaraid_sas 0000:65:00.0: waiting for controller reset to finish
10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
10:30:11 host kernel: megaraid_sas 0000:65:00.0: Adapter is OPERATIONAL for scsi:0
10:30:11 host kernel: megaraid_sas 0000:65:00.0: Snap dump wait time : 15
10:30:11 host kernel: megaraid_sas 0000:65:00.0: Reset successful for scsi0.
10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
10:30:11 host kernel: megaraid_sas 0000:65:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14893 (694866487s/0x0020/CRIT) - Controller encountered an error and was reset
10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14923 (694866525s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c
10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14926 (694866535s/0x0020/CRIT) - Controller encountered an error and was reset
10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14956 (694866572s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c
10:30:11 host kernel: megaraid_sas 0000:65:00.0: 14959 (694866582s/0x0020/CRIT) - Controller encountered an error and was reset
10:30:11 host kernel: megaraid_sas 0000:65:00.0: scanning for scsi0...
10:30:21 host systemd[1]: systemd-udevd.service: Main process exited, code=killed, status=9/KILL
10:30:21 host systemd[1]: systemd-udevd.service: Failed with result 'watchdog'.
10:30:21 host systemd[1]: systemd-udevd.service: Consumed 42.066s CPU time.
10:30:21 host systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 1.
10:30:21 host systemd[1]: Stopped Rule-based Manager for Device Events and Files.
10:30:21 host systemd[1]: systemd-udevd.service: Consumed 42.066s CPU time.
10:30:21 host systemd[1]: Starting Rule-based Manager for Device Events and Files...
10:30:21 host systemd[1]: Started Rule-based Manager for Device Events and Files.
10:30:59 host systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
10:30:59 host smartd[2856]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-10-amd64] (local build)
10:30:59 host smartd[2856]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
10:30:59 host smartd[2856]: Opened configuration file /etc/smartd.conf
10:30:59 host smartd[2856]: Configuration file /etc/smartd.conf parsed.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], opened
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], [NVMe Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a034100025384100000002, S/N: S60, 3.84 TB
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], is SMART capable. Adding to "monitor" list.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], opened
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], [NVMe Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a028580025384100000002, S/N: S61, 3.84 TB
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], is SMART capable. Adding to "monitor" list.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], opened
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], [NVMe Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a042550025384100000002, S/N: S62, 3.84 TB
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], is SMART capable. Adding to "monitor" list.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], opened
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], [NVMe Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a028600025384100000002, S/N: S63, 3.84 TB
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], is SMART capable. Adding to "monitor" list.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], opened
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], [NVMe Dell Ent NVMe v2 .2.0], lu id: 0x3643503052a043550025384100000002, S/N: S64, 3.84 TB
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], is SMART capable. Adding to "monitor" list.
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], state read from /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
10:30:59 host smartd[2856]: Monitoring 0 ATA/SATA, 5 SCSI/SAS and 0 NVMe devices
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_00], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S60.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_01], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S61.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_02], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S62.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S63.scsi.state
10:30:59 host smartd[2856]: Device: /dev/sdb [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.NVMe-Dell_Ent_NVMe_v2-S64.scsi.state
10:30:59 host systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
No Events found!


Dell- Maria J
4 Apprentice
•
278 Posts
0
January 7th, 2022 07:00
Hello codebold,
Thank you for choosing Dell. I am sorry you faced with this issue. Do I understand correctly that the problem started after the firmware update?
It would be great if you could send me logs from iDRAC for checking. Could you please gather and send them to me in Private Message?
How to gather logs:
https://dell.to/3eY5qbV
Please ask me if you have any questions,
codebold
4 Posts
0
January 7th, 2022 07:00
Hello Maria,
thanks for your help! The problem already existed before the firmware upgrade. I will gather the logs and send them to you in a private message.
rageth
1 Message
0
June 8th, 2022 05:00
Hi,
we have the same problem. was there a solution?
Best,
Hp
DELL-Chris H
7 Practitioner
•
9.7K Posts
•
48K Points
0
June 8th, 2022 05:00
Rageth,
Would you confirm if the server is configured for UEFI or BIOS?
Let me know and we can go from there.