Understanding Machine Check Exceptions (MCE)

7,881

Solution 1

First, I fear that I cannot really give good answers to your questions. I also own a Dell XPS 13 (9360) and see the same MCE messages. I'm in contact with Dell Support because of these. They replaced the mainboard but it did not help. Same messages in the logs. At some point they concluded that it is probably a false positive. They had no idea what is causing it, though (mcelog/kernel/Intel problem?). The correspondence with Support is still ongoing.

<rant> Btw, talking to Dell Support is a very unpleasant experience. They seem to only suggest the "standard" solutions like resetting the Firmware, run self-health tests and so on. I didn't had the impression to talk to someone with some technical insight. </rant>

To add more details, I see the same issue on Fedora 24 so it seems not to be related to Ubuntu.

Regarding your questions:

What do these errors mean and should I worry about them?

I don't know. Dell Support thinks those are false positives.

Could these hardware errors be the cause of the freezes of the entire system?

Besides the messages my system works fine. I'd guess the freeze is a different issue.

Should I have the laptop (or parts) replaced by the manufacturer?

Replacing the mainboard did not fix the MCE issue. It might solve the freezing issue, although it seems that this was fixed by a kernel update.

Are there any other actions I should take?

If you are not already in contact with Support, contact them. Maybe they will come up with a real solution once they see that it affects more customers.

Solution 2

enter image description here

I got the same mce errors, started popping up on boot on the last few kernel updates (Fedora 25), but I lost the track on which exact update this started appearing. The notebook is DELL Inspiron 5567 (Intel i5 7200U). However the system works perfectly fine after the boot, so I'm 100% sure this is fake positives appearing for some reason.

Share:
7,881

Related videos on Youtube

justfortherec
Author by

justfortherec

Using Linux on laptop and servers for professional and private use.

Updated on September 18, 2022

Comments

  • justfortherec
    justfortherec almost 2 years

    While trying to debug frequent freezes of my new laptop (KabyLake architecture) running Ubuntu 16.04 I've stumbled upon these entries in kern.log:

    kernel: [    0.041634] mce: [Hardware Error]: Machine check events logged
    

    Since then I have installed mcelog but do not know what to make of the logs. Content of /var/log/mcelog is:

    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479298799 Wed Nov 16 13:19:59 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479298799 Wed Nov 16 13:19:59 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479321645 Wed Nov 16 19:40:45 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479321645 Wed Nov 16 19:40:45 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 43880000086 ADDR fef1db80 
    TIME 1479328438 Wed Nov 16 21:33:58 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 13880000086 ADDR fef1dc00 
    TIME 1479328438 Wed Nov 16 21:33:58 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 43880000086 ADDR fef1db80 
    TIME 1479333991 Wed Nov 16 23:06:31 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 13880000086 ADDR fef1dc00 
    TIME 1479333991 Wed Nov 16 23:06:31 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 43880000086 ADDR fef1db80 
    TIME 1479373350 Thu Nov 17 10:02:30 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 13880000086 ADDR fef1dc00 
    TIME 1479373350 Thu Nov 17 10:02:30 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479373810 Thu Nov 17 10:10:10 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee0000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479373810 Thu Nov 17 10:10:10 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee0000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479375712 Thu Nov 17 10:41:52 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479375712 Thu Nov 17 10:41:52 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479385932 Thu Nov 17 13:32:12 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479385932 Thu Nov 17 13:32:12 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 3880018086 ADDR fef1cf00 
    TIME 1479387666 Thu Nov 17 14:01:06 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 43880018086 ADDR fef1ff00 
    TIME 1479387666 Thu Nov 17 14:01:06 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 43880000086 ADDR fef1db80 
    TIME 1479456710 Fri Nov 18 09:11:50 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 13880000086 ADDR fef1dc00 
    TIME 1479456710 Fri Nov 18 09:11:50 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 0
    CPU 0 BANK 6 
    MISC 43880000086 ADDR fef1db80 
    TIME 1479459374 Fri Nov 18 09:56:14 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    mcelog: Family 6 Model 8e CPU: only decoding architectural errors
    Hardware event. This is not a software error.
    MCE 1
    CPU 0 BANK 7 
    MISC 13880000086 ADDR fef1dc00 
    TIME 1479459374 Fri Nov 18 09:56:14 2016
    MCG status:
    MCi status:
    Error overflow
    Uncorrected error
    MCi_MISC register valid
    MCi_ADDR register valid
    Processor context corrupt
    MCA: corrected filtering (some unreported errors in same region)
    Generic CACHE Level-2 Generic Error
    STATUS ee2000000040110a MCGSTATUS 0
    MCGCAP c08 APICID 0 SOCKETID 0 
    CPUID Vendor Intel Family 6 Model 142
    

    Some observations (please correct me if any of them are wrong):

    • Almost all errors seem to occur on the same page (ADDR fef1xxx)
    • Only banks 6 and 7 seem to be affected.
    • All entries contain "Error overflow" and "Uncorrected error".

    The mcelog FAQ mentions that a "low rate of corrected memory errors is expected and does not require replacing hardware or other action". The log entries contain the phrase "Uncorrected error" which suggests I actually should take some action.

    My questions are:

    1. What do these errors mean and should I worry about them?
    2. Could these hardware errors be the cause of the freezes of the entire system?
    3. Should I have the laptop (or parts) replaced by the manufacturer?
    4. Are there any other actions I should take?
  • justfortherec
    justfortherec over 7 years
    Thanks a lot for your insights. May I ask what Linux you are running to not experience the freezes? Indeed, updating to a 4.8 kernel fixed the issue for me. Are you running on stock Ubuntu 16.04? I will follow your advice and contact Dell.
  • Josef Eisl
    Josef Eisl over 7 years
    I'm currently on an up-to-date Fedora 24 which comes with a 4.8.10 kernel. I did not use the stock Ubuntu 16.04 long enough to tell if there are problems. Good luck with support!
  • Josef Eisl
    Josef Eisl over 7 years
    Another update: Support was able to reproduce it on their test machine. This needs to be fixed upstream. They forwarded the issue internally to some department that will look into it (whatever that means). In addition they suggested to send error reports e.g. to Ubuntu.
  • radesix
    radesix over 7 years
    Not that you need another "me too" but I have a new XPS 9360 and just installed Fedora 25 and get the same MCE errors. They always seem to happen a couple minutes after boot, then I'm fine (and nothing is broken, just annoying Oops messages)
  • Kan-Ru Chen
    Kan-Ru Chen over 7 years
    Same hardware (XPS 9360) and same MCE errors. I'm running Debian sid.
  • Scott
    Scott about 7 years
    I too have this issue. Dell Precision 5520. Fedora 25, Kernel 4.10.8
  • Josef Eisl
    Josef Eisl about 7 years
    @Scott is that also a KabyLake?
  • Scott
    Scott about 7 years
    @JosefEisl yes. CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
  • NikhilWanpal
    NikhilWanpal about 7 years
    Hate to say me too, but before I realised what MCE meant, I asked the same question on AskUbuntu, raised a dell support request, ran all hardware check tests (DellSupportCenterl and pre-boot test) all of which passed, and Dell told me that it was a 'driver' issue that occurred only when you dual-boot and apparently they have already raised it and Ubuntu Devs/ Intel are working on it (couldn't get a link to the issue report). So, for now, I can either remove Windows completely or live with it was their suggestion.
  • Josef Eisl
    Josef Eisl over 6 years
    @NikhilWanpal I don't have a dual-boot setup.
  • NikhilWanpal
    NikhilWanpal over 6 years
    @JosefEisl ha! this was quite an old comment. In my case the issue was resolved by a subsequent BIOS update dell released for the laptop. I installed it while battling a different issue, related to sound card. but at least this is no longer a concern.
  • Josef Eisl
    Josef Eisl over 6 years
    @NikhilWanpal glad to here that!