Highlighted
semprunl
1 Copper

SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Steps to reproduce:

  • Install CentOS 7 (minimal install)
  • Execute yum update
  • Compile and install ixgbe 4.3.13 (needed for Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T)
  • Install ScaleIO using the gateway
  • Execute /opt/emc/scaleio/sdc/bin/drv_cfg --query_guid on any SDC

output of kdump crash:

KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.4.5.el7.x86_64/vmlinux

    DUMPFILE: /var/crash/127.0.0.1-2016-02-12-17:41:36/vmcore  [PARTIAL DUMP]

        CPUS: 8

        DATE: Fri Feb 12 17:41:23 2016

      UPTIME: 00:12:39

LOAD AVERAGE: 1.21, 1.10, 0.69

       TASKS: 199

    NODENAME: node2

     RELEASE: 3.10.0-327.4.5.el7.x86_64

     VERSION: #1 SMP Mon Jan 25 22:07:14 UTC 2016

     MACHINE: x86_64  (2200 Mhz)

      MEMORY: 31.9 GB

       PANIC: "BUG: unable to handle kernel paging request at 00007fff3e4b8570"

         PID: 3366

     COMMAND: "drv_cfg"

        TASK: ffff88084d073980  [THREAD_INFO: ffff880836604000]

         CPU: 7

       STATE: TASK_RUNNING (PANIC)

output of lspci:

00:00.0 Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 02)

00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 02)

00:02.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 02)

00:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 02)

00:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 02)

00:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 02)

00:05.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 02)

00:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 02)

00:05.4 PIC: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC (rev 02)

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)

00:16.1 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #2 (rev 04)

00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)

00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)

00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5)

00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)

00:1f.0 ISA bridge: Intel Corporation C224 Series Chipset Family Server Standard SKU LPC Controller (rev 05)

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)

00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)

00:1f.6 Signal processing controller: Intel Corporation 8 Series Chipset Family Thermal Management Controller (rev 05)

02:00.0 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0

02:00.1 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 1

02:00.2 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 2

02:00.3 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 3

03:00.0 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

03:00.1 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)

07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

ff:0b.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.2 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link Debug (rev 02)

ff:0c.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:10.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 02)

ff:10.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 02)

ff:10.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:10.6 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:10.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:12.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 02)

ff:12.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 02)

ff:13.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 02)

ff:13.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 02)

ff:13.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Broadcast (rev 02)

ff:13.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast (rev 02)

ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1f.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1f.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

I have been able to reproduce it several times and I'm currently out of ideas to make it work.

Thanks a lot in advance for any help and best regards,

Luis Semprun

Tags (1)
0 Kudos
16 Replies
alexkh
1 Nickel

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

can you please post the output of:

1. uname -a

2. cat /etc/release

semprunl
1 Copper

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

uname -a

Linux node2 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


cat /etc/centos-release

CentOS Linux release 7.2.1511 (Core)

0 Kudos
alexkh
1 Nickel

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Unfortunately, CentOS 7.2 is not yet supported.

Please downgrade to CentOS 7.1 and make sure SELinux is not installed.

semprunl
1 Copper

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Hi,


Unfortunately, it also crashes on CentOS 7.1.


uname -a

Linux node1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/centos-release

CentOS Linux release 7.1.1503 (Core)

getenforce

Disabled

Mounting, reading & writing operations do work as expected (so far). My guess is that there's some kind of bug with the drv_cfg tool when installed on bare metal (as it works when the OS is virtualized).

Any ideas?

0 Kudos
daverush
1 Nickel

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

semprunl,

I just tested this on an (admittedly different, but similar) virtual machine at both 7.1.1503 as well as 7.2.1511.

I am unable to get drv_cfg to error in this way, which jives with your experience of "if virtualized, no issue".

Do you have successful volume access otherwise? Is it only --query_guid that it crashes on, or any drv_cfg command?

Did you check the md5sum of the RHEL7 installer archive or re-download and extract on other boxes, md5summing there to verify it's not just a faulty copy of the SDC binary?

Since it works virtualized I'd investigate the difference in driver modules running between bare metal and virtual installs, and test with/without them.


The fact that you need to compile the ethernet driver also doesn't comfort me, and I wonder if the kernel is 100% happy with the NIC.

A non-answer, I know, but let me know what you find.

0 Kudos
semprunl
1 Copper

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Hi Rush,

I do have successful volume access but the following drv_cfg commands are crashing the kernel:

--query_diag_counters

--query_guid


The md5sum is valid and the compiled driver is the most current release of the ixgbe driver for Linux, which supports kernel versions 2.6.18 up through 4.3.3.  It also has been tested on the following distributions:

  - RHEL 6.7

  - RHEL 7.2

  - SLES 11SP4

  - SLES 12PS1

I also tried with an older version of ixgbe (4.1.2) which was tested with RHEL 7.1 but still the same outcome (kernel crash).

Thanks for taking a look.

0 Kudos
semprunl
1 Copper

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

You may find useful the fact that I can not reproduce the crash on CentOS 6.7 using the same server and with the same Ethernet driver.

0 Kudos
ulrix
1 Copper

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Hi Semprunl,

Could you solve the problem? I also got a kernel panic with Centos 7.2 and ScaleIO 1.32.3, 1.32.4, 2.0. All panics are related to the sdc. I could reproduce the panic with 3 machines. All running Centos bare metal. I'm testing on 3 Intel NUCs

output of lspci:

00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)

00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)

00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09)

00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03)

00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-V (rev 03)

00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)

00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #1 (rev e3)

00:1c.3 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #4 (rev e3)

00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03)

00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03)

00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03)

00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03)

02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)

kdump:

[  653.630233] BUG: unable to handle kernel paging request at 00007ffd5b2d5e20

[  653.630282] IP: [<ffffffffa099df3d>] ioctl_Handler+0x8bd/0xda0 [scini]

[  653.630333] PGD 43a808067 PUD 43afa1067 PMD 43afa3067 PTE 800000042a9a3067

[  653.630382] Oops: 0003 [#1] SMP

[  653.630403] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver scini(POE) nfs fscache intel_powerclamp coretemp intel_rapl kvm_intel kvm arc4 crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep aesni_intel lrw snd_seq gf128mul snd_seq_device glue_helper ablk_helper cryptd sg iwlwifi snd_pcm ir_lirc_codec lirc_dev ir_sony_decoder ir_sanyo_decoder ir_mce_kbd_decoder pcspkr btusb ir_jvc_decoder cfg80211 bluetooth ir_rc6_decoder rfkill ir_nec_decoder ir_rc5_decoder snd_timer snd mei_me soundcore mei i2c_i801 lpc_ich shpchp mfd_core rc_rc6_mce nuvoton_cir rc_core acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif

[  653.630740]  crct10dif_generic i915 ahci libahci crct10dif_pclmul crct10dif_common crc32c_intel i2c_algo_bit libata drm_kms_helper serio_raw e1000e drm ptp pps_core sdhci_acpi sdhci video mmc_core i2c_hid i2c_core dm_mirror dm_region_hash dm_log dm_mod

[  653.630831] CPU: 2 PID: 11545 Comm: drv_cfg Tainted: P           OE  ------------   3.10.0-327.13.1.el7.x86_64 #1

[  653.630863] Hardware name: TAROX ECO 44 G5/NUC5i5RYB, BIOS RYBDWi35.86A.0355.2016.0224.1501 02/24/2016

[  653.630894] task: ffff8800974bc500 ti: ffff88043a81c000 task.ti: ffff88043a81c000

[  653.630918] RIP: 0010:[<ffffffffa099df3d>]  [<ffffffffa099df3d>] ioctl_Handler+0x8bd/0xda0 [scini]

[  653.630950] RSP: 0018:ffff88043a81fe20  EFLAGS: 00010246

[  653.630970] RAX: 00000000c30b5096 RBX: 00007ffd5b2d5e10 RCX: 0000000000000000

[  653.630993] RDX: 0000000000000018 RSI: ffff88043a81fe50 RDI: 00007ffd5b2d5e28

[  653.631015] RBP: ffff88043a81feb0 R08: 0000000000000000 R09: 0000000000000000

[  653.631039] R10: 00007ffd5b2d5b80 R11: 0000000000000000 R12: 0000000000000000

[  653.631061] R13: 00007ffd5b2d5e10 R14: 00007ffd5b2d5e10 R15: 0000000000000000

[  653.631084] FS:  00007f737ef28780(0000) GS:ffff880456d00000(0000) knlGS:0000000000000000

[  653.631111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  653.631130] CR2: 00007ffd5b2d5e20 CR3: 000000043a80a000 CR4: 00000000003407e0

[  653.631152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[  653.631175] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

[  653.631198] Stack:

[  653.631205]  ffff880406e04b00 c24410ced1ad23a3 1f2562d7c30b5096 c24410ced1ad23a3

[  653.631232]  1f2562d7c30b5096 ffffffff811efeeb ffff880440159f20 ffff88043a81fec8

[  653.631261]  000fffff6e6963dc fffdffef00000000 0000000000000001 000000001c8a8a4f

[  653.631288] Call Trace:

[  653.631299]  [<ffffffff811efeeb>] ? do_filp_open+0x4b/0xb0

[  653.631318]  [<ffffffff811f1fa5>] do_vfs_ioctl+0x2e5/0x4c0

[  653.631340]  [<ffffffff8128bdde>] ? file_has_perm+0xae/0xc0

[  653.631360]  [<ffffffff81641301>] ? __do_page_fault+0x91/0x450

[  653.631379]  [<ffffffff811f2221>] SyS_ioctl+0xa1/0xc0

[  653.631399]  [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b

[  653.631418] Code: 18 00 00 00 48 89 df 48 89 44 24 18 48 8b 44 24 10 48 89 44 24 20 e8 a3 21 96 e0 85 c0 49 89 c4 0f 85 bf 02 00 00 e8 53 71 ff ff <89> 43 10 45 31 e4 e8 68 71 ff ff 89 43 14 e9 d0 f7 ff ff 48 8d

[  653.631538] RIP  [<ffffffffa099df3d>] ioctl_Handler+0x8bd/0xda0 [scini]

[  653.631566]  RSP <ffff88043a81fe20>

[  653.631577] CR2: 00007ffd5b2d5e20

0 Kudos

Re: SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Can you please supply the get_info output (including the crash dump, if you have it)?

Either from run collect logs from the IM-web -> maintenance view, or run /opt/emc/scaleio/mdm/bin/get_info.sh

0 Kudos