Highlighted
Qoqnos
Bronze

Dell XPS 13 9360 - lots of machine check exception-messages

Is anyone seeing lots of messages like this on a DELL XPS 13 9360?

I get this with any kernel I tried - Ubuntu 16.04.1 stock kernel, latest vanilla as well as mainline (both Ubuntu & Arch linux).

[    0.041088] mce: CPU supports 8 MCE banks
[    0.041097] mce: [Hardware Error]: Machine check events logged
[  179.824402] mce: [Hardware Error]: Machine check events logged
[  470.709754] mce: [Hardware Error]: Machine check events logged
[ 6943.396310] mce: [Hardware Error]: Machine check events logged
[10054.482683] mce: [Hardware Error]: Machine check events logged
[10166.746861] mce: [Hardware Error]: Machine check events logged
[11087.548612] mce: [Hardware Error]: Machine check events logged
[12090.893965] mce: [Hardware Error]: Machine check events logged
[12718.384254] mce: [Hardware Error]: Machine check events logged
[13489.322288] mce: [Hardware Error]: Machine check events logged
[14100.885807] mce: [Hardware Error]: Machine check events logged
[15566.970294] mce: [Hardware Error]: Machine check events logged
[16261.061946] mce: [Hardware Error]: Machine check events logged
[17441.723740] mce: [Hardware Error]: Machine check events logged
[24777.496754] mce: [Hardware Error]: Machine check events logged
[27846.728994] mce: [Hardware Error]: Machine check events logged

Nothing reallly seems wrong though. It's just this darn message. Anyone else seeing them with dmesg|grep mce?

23 Replies

RE: Dell XPS 13 9360 - lots of machine check exception-messages

Hi,

I have 9360 DE and haven't such messages in syslog. As I remember (I can be completely wrong) mce event - some event that logs in BIOS and you can see this event somehow (there should be some special tool for Linux)

0 Kudos
Qoqnos
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

*** it! I was hoping it was just a software error, that is a kernel error on this particular hardware, but I guess something's wrong with the hardware then. I've run the BIOS diagnostics test, and it found no errors either...

0 Kudos
beurle
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

I get them too. It seems you need mce logging setup to get more detail. I tried Fedora 25 Beta and they were there too, possibly causing ABRT traps. I wondering whether the kernel has trouble detecting the processor architecture.

0 Kudos
Qoqnos
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

ah, good to know i'm not alone with this!

i'd be happy to look into logs and whatnot if someone tells me where to look. 

0 Kudos
beurle
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

I dont think the mce logging is installed by default on this distro, maybe

sudo apt-get install mcelog

On Fedora I did this to see log output;

journalctl -u mcelog

0 Kudos
Qoqnos
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

Ah, thanks for the journalctl command. I don't understand most of this, but there are a bunch of funky things going on here:

nov 19 11:21:10 expis mcelog[22189]: Processor context corrupt
nov 19 11:21:10 expis mcelog[22189]: MCA: corrected filtering (some unreported errors in same region)
nov 19 11:21:10 expis mcelog[22189]: Generic CACHE Level-2 Generic Error
nov 19 11:21:10 expis mcelog[22189]: STATUS ee2000000040110a MCGSTATUS 0
nov 19 11:21:10 expis mcelog[22189]: MCGCAP c08 APICID 0 SOCKETID 0
nov 19 11:21:10 expis mcelog[22189]: CPUID Vendor Intel Family 6 Model 142
nov 19 11:21:10 expis mcelog[22189]: mcelog: Family 6 Model 8e CPU: only decoding architectural errors
nov 19 11:21:10 expis mcelog[22189]: Hardware event. This is not a software error.
nov 19 11:21:10 expis mcelog[22189]: MCE 5
nov 19 11:21:10 expis mcelog[22189]: CPU 0 BANK 7
nov 19 11:21:10 expis mcelog[22189]: MISC 4f880000086 ADDR fef1dbc0
nov 19 11:21:10 expis mcelog[22189]: TIME 1479546791 Sat Nov 19 10:13:11 2016
nov 19 11:21:10 expis mcelog[22189]: MCG status:
nov 19 11:21:10 expis mcelog[22189]: MCi status:
nov 19 11:21:10 expis mcelog[22189]: Error overflow
nov 19 11:21:10 expis mcelog[22189]: Uncorrected error
nov 19 11:21:10 expis mcelog[22189]: MCi_MISC register valid
nov 19 11:21:10 expis mcelog[22189]: MCi_ADDR register valid
nov 19 11:21:10 expis mcelog[22189]: Processor context corrupt
nov 19 11:21:10 expis mcelog[22189]: MCA: corrected filtering (some unreported errors in same region)
nov 19 11:21:10 expis mcelog[22189]: Generic CACHE Level-2 Generic Error
nov 19 11:21:10 expis mcelog[22189]: STATUS ee2000000040110a MCGSTATUS 0
nov 19 11:21:10 expis mcelog[22189]: MCGCAP c08 APICID 0 SOCKETID 0
nov 19 11:21:10 expis mcelog[22189]: CPUID Vendor Intel Family 6 Model 142
nov 19 11:21:10 expis mcelog[22189]: mcelog: warning: 16 bytes ignored in each record
nov 19 11:21:10 expis mcelog[22189]: mcelog: consider an update

Would be great if a Dell person would say whether there's anything to worry about here or not.

0 Kudos

RE: Dell XPS 13 9360 - lots of machine check exception-messages

I think best way is to ask support

0 Kudos
OCmylife
Copper

RE: Dell XPS 13 9360 - lots of machine check exception-messages

I have this error too, on Arch Linux with kernel-4.8.10-r1. I think we have nothing to worry about, as the laptop runs really fine. In the next few days I will definetly give Gentoo a try and report back, as the issue exists there also with a self cofigured kernel.

0 Kudos
abrahm
Bronze

RE: Dell XPS 13 9360 - lots of machine check exception-messages

I am seeing quite a few mce events with my new xps13-9360 as well.

1. Lots of "Processor context corrupt"/"Generic CACHE Level-2 Generic Error" messages. I seem to see them at each boot and on resume. I haven't seen any stability issues, however, but this message still makes me nervous.

2. Lots of "Processor 1 heated above trip temperature. Throttling enabled." events. It looks like the processor's thermal throttling kicks somewhere just above 96C, but the fan doesn't actually start spinning up in time to cool the CPU back down to avoid the thermal throttling. Once the fan does kick in, everything seems fine... until I'm idle for a bit and then start real work and it happens all over again.

I updated mcelog to v144 to see if I get any more detail, but I don't see any change.

0 Kudos