Linux node1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)
getenforce
Disabled
Mounting, reading & writing operations do work as expected (so far). My guess is that there's some kind of bug with the drv_cfg tool when installed on bare metal (as it works when the OS is virtualized).
I just tested this on an (admittedly different, but similar) virtual machine at both 7.1.1503 as well as 7.2.1511.
I am unable to get drv_cfg to error in this way, which jives with your experience of "if virtualized, no issue".
Do you have successful volume access otherwise? Is it only --query_guid that it crashes on, or any drv_cfg command?
Did you check the md5sum of the RHEL7 installer archive or re-download and extract on other boxes, md5summing there to verify it's not just a faulty copy of the SDC binary?
Since it works virtualized I'd investigate the difference in driver modules running between bare metal and virtual installs, and test with/without them.
The fact that you need to compile the ethernet driver also doesn't comfort me, and I wonder if the kernel is 100% happy with the NIC.
A non-answer, I know, but let me know what you find.
I do have successful volume access but the following drv_cfg commands are crashing the kernel:
--query_diag_counters
--query_guid
The md5sum is valid and the compiled driver is the most current release of the ixgbe driver for Linux, which supports kernel versions 2.6.18 up through 4.3.3. It also has been tested on the following distributions:
- RHEL 6.7
- RHEL 7.2
- SLES 11SP4
- SLES 12PS1
I also tried with an older version of ixgbe (4.1.2) which was tested with RHEL 7.1 but still the same outcome (kernel crash).
Hi, was anyone able to solve kernel crash problem ? We ran into the same issue. And we found out, that crash is related to Intel Xeon E5 v4 CPU ( on v3 SDC works fine).
alexkh
60 Posts
1
February 15th, 2016 07:00
can you please post the output of:
1. uname -a
2. cat /etc/release
semprunl
5 Posts
0
February 15th, 2016 23:00
uname -a
Linux node2 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
alexkh
60 Posts
1
February 16th, 2016 04:00
Unfortunately, CentOS 7.2 is not yet supported.
Please downgrade to CentOS 7.1 and make sure SELinux is not installed.
semprunl
5 Posts
0
February 18th, 2016 04:00
Hi,
Unfortunately, it also crashes on CentOS 7.1.
uname -a
Linux node1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)
getenforce
Disabled
Mounting, reading & writing operations do work as expected (so far). My guess is that there's some kind of bug with the drv_cfg tool when installed on bare metal (as it works when the OS is virtualized).
Any ideas?
daverush
51 Posts
0
February 18th, 2016 14:00
semprunl,
I just tested this on an (admittedly different, but similar) virtual machine at both 7.1.1503 as well as 7.2.1511.
I am unable to get drv_cfg to error in this way, which jives with your experience of "if virtualized, no issue".
Do you have successful volume access otherwise? Is it only --query_guid that it crashes on, or any drv_cfg command?
Did you check the md5sum of the RHEL7 installer archive or re-download and extract on other boxes, md5summing there to verify it's not just a faulty copy of the SDC binary?
Since it works virtualized I'd investigate the difference in driver modules running between bare metal and virtual installs, and test with/without them.
The fact that you need to compile the ethernet driver also doesn't comfort me, and I wonder if the kernel is 100% happy with the NIC.
A non-answer, I know, but let me know what you find.
semprunl
5 Posts
0
February 19th, 2016 03:00
Hi Rush,
I do have successful volume access but the following drv_cfg commands are crashing the kernel:
--query_diag_counters
--query_guid
The md5sum is valid and the compiled driver is the most current release of the ixgbe driver for Linux, which supports kernel versions 2.6.18 up through 4.3.3. It also has been tested on the following distributions:
- RHEL 6.7
- RHEL 7.2
- SLES 11SP4
- SLES 12PS1
I also tried with an older version of ixgbe (4.1.2) which was tested with RHEL 7.1 but still the same outcome (kernel crash).
Thanks for taking a look.
semprunl
5 Posts
0
February 19th, 2016 03:00
You may find useful the fact that I can not reproduce the crash on CentOS 6.7 using the same server and with the same Ethernet driver.
tomer__engineer
155 Posts
0
April 21st, 2016 05:00
Can you please supply the get_info output (including the crash dump, if you have it)?
Either from run collect logs from the IM-web -> maintenance view, or run /opt/emc/scaleio/mdm/bin/get_info.sh
tomer__engineer
155 Posts
0
April 21st, 2016 05:00
Can you please supply the get_info output (including the crash dump, if you have it)?
Either from run collect logs from the IM-web -> maintenance view, or run/opt/emc/scaleio/mdm/bin/get_info.sh
pawelw1
306 Posts
0
April 22nd, 2016 05:00
Hi Tomer,
We will try to reproduce it in the lab and escalate to L3 if necessary.
Thank you,
Pawel
Matas1
22 Posts
0
September 22nd, 2016 01:00
Hi, was anyone able to solve kernel crash problem ? We ran into the same issue. And we found out, that crash is related to Intel Xeon E5 v4 CPU ( on v3 SDC works fine).
Thanks,
Matas
Matas1
22 Posts
0
September 22nd, 2016 05:00
HI, and seems we found the solution: SDC version: EMC-ScaleIO-sdc-2.0-7120.0.el7.x86_64.rpm doesn't crash kernel.
Our setup: CentOS 3.10.0-327.36.1.el7.x86_64
Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 01)
Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
pawelw1
306 Posts
1
September 22nd, 2016 08:00
Hi Matas,
It sounds like:
https://support.emc.com/kb/486909
drv_cfg problem fixed in ScaleIO 2.0.0.2 and onwards.
Thanks!
Pawel