Windows Server 2012 R2 (Hyper-V VMs) - random BSOD
Solution 1
Setting "Package C State Limit - C0/C1 State" causes BSODs (as well as setting Power Technology - [Disable]). Because I can't set "C0/C1 State", I choosed "C2 state" which is working without problems. In a nutshell: The higher Package C State Limit you chose, the more energy efficient CPU would be (by stopping clocks, reducing voltage...).
The best performance settings in this case should be:
Advanced Power Management Configuration:
Power Technology - [Custom]
Energy Performance Tuning - [Disable]
Energy Performance BIAS setting. - [Performance]
Energy Efficient Turbo - [Disable]
CPU P State Control:
EIST(P-States) - [Enable]
Turbo Mode - [Enable]
P-state Coordination - [HW_ALL]
CPU C State Control:
Package C State Limit - [C2 state]
CPU C3 Report - [Disable]
CPU C6 Report - [Disable]
Enhanced Halt State (C1E) - [Disable]
I found, that this type of problem appeared few times in the past and was fixed by updating ROM or by Host Microcode update like this: KB2970215. But I haven't found any working update yet.
sources:
http://www.supermicro.com/support/faqs/faq.cfm?faq=21555
http://www.supermicro.com/support/faqs/faq.cfm?faq=21499
Solution 2
Solution that worked for me:
- Set the following Custom Power Settings under Advanced Power Management Configuration:
Note: The highlighted lines are the important changes, but make sure the other settings are also the same as in the pictures
Other things that I did, which may have helped (I did these before doing the above, so I'm not sure if it is relevant or not):
Installed KB2970215 from Microsoft - this fixes "random blue screens" on specific CPU chipsets
Installed the latest drivers for the Intel Chipset from Supermicro's web site (for me, it is ftp://ftp.supermicro.nl/driver/Intel_INF/C612_Series_Chipset/Chipset_v10.1.2.8.zip - locate one best suited for you)
- Installed the latest Network Driver (example: ftp://ftp.supermicro.nl/driver/LAN/Intel/PRO_v20.3.zip)
- Installed the RSTE Utility & Driver (example: ftp://ftp.supermicro.nl/driver/SATA/Intel_PCH_RAID_Romley_RSTE/Management/4.3.0.1219/IATA_CD.exe)
Sources:
- https://social.technet.microsoft.com/Forums/en-US/f8ba6d82-b79d-4b17-b13b-269841a9f236/vm-going-down-bugcheck-0x109?forum=winserverhyperv
- Supermicro Partner Support
Related videos on Youtube
devlin
Updated on September 18, 2022Comments
-
devlin over 1 year
I have a problem. My VMs (Hyper-V) - Windows Server 2012 R2 restart themselves quite often (BSOD: CRITICAL_STRUCTURE_CORRUPTION (109)). Last time it was 11x over weekend. I have new HW, 2x Supermicro server. I installed Windows Server 2012 R2 and Hyper‑V role on both servers (+ drivers from Supermicro website are installed). As a guest systems (VMs) I have 2x Windows Server 2012 and 1x Windows Server 2012 R2 on each Hyper-V host. Like I wrote, problem is, that W2012R2 VMs randomly restart themselves. But only W2012R2 VMs. VMs with W2012 are OK. All systems are clean, no applications are installed and there is no workload.After reboot, there are these events logged on VMs:
Kernel-Power 41
EventData: BugcheckCode 265 BugcheckParameter1 0xa3a01f59e148b50a BugcheckParameter2 0xb3b72be033c8b301 BugcheckParameter3 0x1a0 BugcheckParameter4 0x7 SleepInProgress 0 PowerButtonTimestamp 0 BootAppStatus 0
BugCheck 1001
EventData param1 0x00000109 (0xa3a01f59e148b50a, 0xb3b72be033c8b301, 0x00000000000001a0, 0x0000000000000007) param2 C:\Windows\MEMORY.DMP param3 021516-3093-01
WinDbg output:
CRITICAL_STRUCTURE_CORRUPTION (109) This bugcheck is generated when the kernel detects that critical kernel code or data have been corrupted. There are generally three causes for a corruption: 1) A driver has inadvertently or deliberately modified critical kernel code or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx 2) A developer attempted to set a normal kernel breakpoint using a kernel debugger that was not attached when the system was booted. Normal breakpoints, "bp", can only be set if the debugger is attached at boot time. Hardware breakpoints, "ba", can be set at any time. 3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data. Arguments: Arg1: a3a01f5a69a8b6bb, Reserved Arg2: b3b72be0bc28b4a2, Reserved Arg3: 00000000000001a0, Failure type dependent information Arg4: 0000000000000007, Type of corrupted region, can be 0 : A generic data region 1 : Modification of a function or .pdata 2 : A processor IDT 3 : A processor GDT 4 : Type 1 process list corruption 5 : Type 2 process list corruption 6 : Debug routine modification 7 : Critical MSR modification
Debugging Details:
PG_MISMATCH: 40000 CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT_SERVER BUGCHECK_STR: 0x109 PROCESS_NAME: System CURRENT_IRQL: 2 ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre STACK_TEXT: ffffd001\`1bb7e088 00000000\`00000000 : 00000000\`00000109 a3a01f5a\`69a8b6bb b3b72be0\`bc28b4a2 00000000\`000001a0 : nt!KeBugCheckEx STACK_COMMAND: kb SYMBOL_NAME: ANALYSIS_INCONCLUSIVE FOLLOWUP_NAME: MachineOwner MODULE_NAME: Unknown_Module IMAGE_NAME: Unknown_Image DEBUG_FLR_IMAGE_TIMESTAMP: 0 IMAGE_VERSION: BUCKET_ID: BAD_STACK FAILURE_BUCKET_ID: BAD_STACK ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:bad_stack FAILURE_ID_HASH: {75814664-faf6-4b70-bbc7-dc592132ecdd} Followup: MachineOwner
Sometimes, there is this event logged on the host server. But not every time when VM fails:
Hyper-V-Worker 18590
VmErrorCode0 0x109 VmErrorCode1 0xbb8d251d VmErrorCode2 0xe0d2304 VmErrorCode3 0x1a0 VmErrorCode4 0x7
Could you help me solve this problem please?
-
TomTom over 8 yearsDefective hardware, crappy driver. One of them. If that is a brand machine - why do you ask here, get it fixed. We can not fix broken hardware.
-
devlin over 8 yearsI don't think it's broken hardware, because I have 2 brand new servers and both suffer in the same way. I don't think, that both servers are broken. And host systems (OSs) are running without problems.
-
TomTom over 8 yearsThen you have a driver problem. Simple like that.
-
Jonathan Piccirilli over 8 yearsWhat supoermicro drivers are you using? I would suggest using the drivers for the hardware exposed to the virtual machine by the host (which means, letting windows update handle it). Your virtual boxes should have no idea what physical hardware you are using as it is all abstracted.
-
devlin about 8 yearsJonathan Piccirilli: I'm using latest drivers from Supermicro website for Windows Server 2012 R2. And I don't know if I understand what you mean, but I use Supermicro drivers only for my host OS of course :)
-
devlin about 8 yearsI already found out, that the problem is in the power management settings. Disabling "Power Technology" is making troubles. Now I have this option set on "Custom" and I'm trying to find out which specific option is problematic. But hotfix in the link you provided looks interesting. I will try it. Thanks
-
Admin about 8 yearsFor what it is worth, I have been have had this same issue where the Host OS is stable but the Guest OS in Hyper-V has BSOD issues on the X10DRL-i and the X10SRL-F. The only stable systems I have seen in the X10 series so far are the X10DRi/X10DRi-T and the X10SAE. In each of those cases I have power management in the BIOS completely turned off, the Host and Guest OS energy profile set to High performance with all power controls shut off for max 24/7 performance for these High demand SQL systems. I will try some of the suggested power configurations on my "failing" boards and see what works. T
-
-
Daniel Nachtrub about 8 yearswe're seeing this issue on one of our servers as well - tried several power settings already over the last two days (according to ms forums). issue just persists
-
KeyszerS about 8 yearsThanks for the feedback - it's why I've put in "Potential" solution :) I'm trying this out myself since I have the same issue. Do you have Supermicro too?
-
Daniel Nachtrub about 8 yearsyep - we're seeing it on a X10SRW-F with an E5 2660v3. our supplier saw propably the same issue on another X10 board. we've two other X10SRW-F with Xeon E5 1650v3 running without any issues.
-
KeyszerS about 8 yearsThx - I'm seeing this on a SYS-1028U-TR4+, which has the X10DRU-i+ motherboard.
-
devlin about 8 yearsI tested mentioned hotfix, but it didn't work. When I disable "Power Technology", VMs are failing. What seems to be a problematic option (when setting "Power Technology - Custom") is "Package C State Limit - C0/C1 State". When I leave that option on "Custom default" - "C6(Retention) state", it works. At least for now. I need more time to be sure. "Power Technology - Energy Efficient" works for me too, but nobody wants that.