MCE Error Codes/Pink Screen - Should they be a cause for concern?

5,773

Solution 1

You see this error (MCE, machine check exception) precisely because it has ECC RAM.

You have some broken hardware somewhere, most likely a memory stick but possibly one or more processors (CPU 10 perhaps?) or something in between. Invoke your support contract.

It can be other bits of the hardware also, but every time I have seen this it has been faulty ECC RAM experiencing multiple-bit faults. If the MCE decoded as "internal timer error", the next most likely thing is a faulty CPU or mainboard.

Solution 2

Yes, it's a cause for concern. The server crashed!

Check your RAM and your CPU socket pins (if you hand-assembled the server).

That's about all the info you'll get. You can open a support case with VMware and they'll analyze the crash dump for you.

Share:
5,773

Related videos on Youtube

davewolfs
Author by

davewolfs

Updated on September 18, 2022

Comments

  • davewolfs
    davewolfs almost 2 years

    So I recently purchased a server grade system along with all server grade peripherals. I'm licensed for ESXi 6 and have all recent patches installed. System has been running around 2 weeks now and all of a sudden I had a complete crash.

    I've interpreted this error code as "Internal Timer Error". I've forwarded the info to SuperMicro but to be honest I'm not very confident with their responses so far. My interpretation was that the system simply should not crash - for the reason that it's a Xeon with ECC memory running ESXi.

    Is it possible that this was some one off error and shouldn't happen again? How would you handle this? Looking for some advice from those who have seen these types of errors and what they end up actually doing.

    Crash

  • davewolfs
    davewolfs over 8 years
    Is there anyway to tell the difference between the two? I'm pretty confident that I have decoded it correctly.
  • Falcon Momot
    Falcon Momot over 8 years
    I believe the codes are vendor-specific, and I don't actually see the MCE code in there. But, surely your vendor (awful though supermicro may be) has some kind of diagnostic tool you can run... either way, you should get them to fix the hardware or go fix the hardware. Just like any other time, go isolate the broken component.
  • davewolfs
    davewolfs over 8 years
    Can a utility like memtest86+ be useful in this case or unlikely to help?
  • Falcon Momot
    Falcon Momot over 8 years
    It can be useful.
  • davewolfs
    davewolfs over 8 years
    Any opinions on Intel's Product Specification Updates. I'm seeing some stuff in there related to Internal Timer Errors. I suppose the CPU's themselves can have bugs (or their bioses).
  • Falcon Momot
    Falcon Momot over 8 years
    They can, but if there is a bug in there chances are there is a microcode update too.
  • davewolfs
    davewolfs over 8 years
    Kinda odd that the stack shows "Power_Halt" and there is a reported bug with a potential bios fix listed here under BDE54, also shows same MCE code. www3.intel.com/content/dam/www/public/us/en/documents/…