CPU temperatures in linux: throttling or wrong reading?

15,633

Solution 1

The difference is due to windows and linux using different CPU throttling profiles.

You do have some control over this on linux. For example, the following command will show you which profile is currently being used:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

There are ways to choose which profiles to use. The Arch Linux wiki has good information on this, it may be worth a read:

CPU Frequency Scaling - Arch Wiki

There is an additional issue of fan control -- you need to make sure you have the proper drivers for controllin your fans and that they are set to a high enough speed when gaming.

Linux on Laptops can be a helpful resource.

Solution 2

I believe the reason for the difference is because your Windows installation has the Intel Dynamic Platform and Thermal Framework driver which adjusts the frequencies to manage the temperature. Your Linux installation does not have this so your CPU will just run at maximum frequency regardless of temperature, so it'll jump to the maximum of 100C until the CPU's firmware minimizes all frequencies until the temperature drops, then it returns to max frequency and this repeats.

Unfornately the Intel DPTF driver is not availabe for linux AFAIK, so unless you can set up some software to do the equivalent of it (thermald maybe?), I guess you can only simply limit the maximum frequency.

I do not understand how the temperature changes so instantaneously though (it changes between 60C and 100C in a fraction of a second on my friend's Y520 that's missing the Intel DPTF driver; I've never seen any CPU temp change anywhere near that rapidly). I thought it's caused by bad thermal contact between the CPU and heatsink but maybe the chip is designed like this and it's supposed to be managed by the Intel DPTF driver to work properly.

Share:
15,633

Related videos on Youtube

zjeffer
Author by

zjeffer

Multimedia & Creative Technology student at Howest, Kortrijk, Belgium.

Updated on September 18, 2022

Comments

  • zjeffer
    zjeffer over 1 year

    Using a Lenovo Legion Y520 with i7-7700HQ (base clock 2.8Ghz) and GTX 1050.

    I'm getting CPU overheating warnings in linux and it's affecting my performance in games (found in Payday 2 and CS:GO). I've never had problems in Windows.

    This is what I found when trying to troubleshoot this issue:

    In Windows 10 (using aida64)

    • Windows stays at around 3.4 Ghz on idle (because my power settings are set to 'high performance' instead of the default 'balanced'), with a temperature of around 50C.

    • When stressing the cpu, the temperature goes slowly (in a couple seconds instead of instantly) from about 50C to around 75C and stays there comfortably. Clock speeds are about 2.9Ghz when stressing. Utilization is always 100%. Aida64 doesn't report throttling. The voltage on the CPU core goes from about 1.1 to 0.9 when stressing.

    In Arch Linux (using s-tui)

    • Linux stays at around 2.0Ghz on idle, with a temperature of around 50C.

    • Here's where it gets weird: when stressing the cpu, the temperature IMMEDIATELY goes from 50C to about 93C. Clock speeds are exactly 3.4Ghz when stressing. Utilization is always 100%. When turing the stress test off, the temperature IMMEDIATELY goes back to about 50C, as if nothing ever happened. The laptop certainly doesn't feel like it heats up to 90C+ when doing this, even after a long stress.

    Here's an image that shows how temperature, power, and frequency all go down at the exact same time. Notice how much cpu temperature changes in so little time. Image of throttling in linux

    How do I fix this throttling issue? Do I undervolt my CPU in linux? How come it reads temperatures wrong in Linux but not in Windows?

    I changed the profile using cpupower from powersave to performance. I still see the same throttling in s-tui. There is a jump up in idle cpu frequency when setting to performance (instead of around 2000-2500Mhz to always at 3400Mhz), but that's the only thing that has changed.

    Fan control

    I tried to control fans using fancontrol (lm_sensors), but pwmconfig says there are no pwm-capable sensor modules installed.

    I tried it with NBFC, but it doesn't seem to be doing anything, no matter what profile I choose. I don't even know if NBFC can control my fans, but it doesn't report any errors when choosing a profile.

    I also tried thinkfan, but it doesn't seem to help with throttling. It also thinks my fan's speed is at 8RPM, see this thread

    Solution

    I found that lowering the maximum allowed cpu frequency using cpupower to something like 3100MHz instead of the default 3800 fixes all issues.

    sudo cpupower frequency-set -u 3100MHz

    I also changed max_freq in /etc/default/cpupower to the same value, to make it permanent. I found that this does result in a slight fps drop in games, but nothing serious. At least my fps is stable :)

    Sadly I think this might result in decreased performance in non-gaming tasks like when compiling something.

    After 1.5 years

    I just stability tested Windows again (with AIDA64) and found it now also thermal throttles. As you can see in the image below the temperatures jump quickly to the high 90s and AIDA64 reports throttling. The clock speed idles at 3.4GHz and a few seconds after starting the test it drops to around 800MHz, before jumping up to 3.4GHz again a second later. It doesn't decide to lower the clock speed while stresstesting to something like 2.9GHZ (like before).

    AIDA64 on Windows, power setting = High Performance

    How come it suddenly stopped lowering the maximum frequency in Windows?

    • Jeff A
      Jeff A over 5 years
      Yes temperatures on microchips can jump really high almost instantaneously depending on if/how they are cooled, especially when using up lots of power. Ever touched a running CPU? I don't recommend it. Additionally, the high performance profile could still be scaling the frequency down during parts of the stress test that aren't using as much number crunching. Source: am computer engineer
    • zjeffer
      zjeffer over 5 years
      Thanks for the explanation. So how can I eliminate throttling when both powersave and performance don't help against throttling?
    • Jeff A
      Jeff A over 5 years
      If I had to take a guess, I'd say you need better cooling while you are putting high stress on the CPU/GPU. I can't really vouch for the temperature sensors and whether they actually function properly on Linux -- every motherboard is different. If you have some extra cash try getting a gaming laptop cooling pad with a bunch of fans on it. The scaling/lagging could be happening due to temperature spikes. You really want to stay below 80 degrees Celsius at all times
    • zjeffer
      zjeffer over 5 years
      Is there no way to automatically undervolt my cpu just like it does in Windows? I've never seen it go above 80C in Windows, but it seems to go to 95-100 in Linux
    • Jeff A
      Jeff A over 5 years
      I was thinking that maybe linux isn't turning all your fans up to the proper speed when things get hot, leading to the throttling. Have you messed around with programs that would let you control the fan speed and just put them on full blast before stress testing or gaming?
    • Jeff A
      Jeff A over 5 years
      Hey looks to me like you are missing the proper drivers for your fan... you might want to do some searching on the exact model of motherboard or laptop you are using to see if anyone has encountered this exact issue before. Sometimes you get lucky. This site can also help: linux-laptop.net
    • fra-san
      fra-san over 5 years
      Hi @zjeffer! Stack Exchange sites are not really well suited for forum-like discussion. Significantly editing a question, especially if it has an accepted answer, makes the whole Q&A hard to read (and is usually not well received). I think you should post a new answer instead. E.g. leave this one (about what may be the cause for the overheating you experienced) as it was before your second or third update and post a new one, say, on cpupower or thinkfan.
    • zjeffer
      zjeffer over 5 years
      Hi @fra-san, sorry :). Do you mean I should make a new thread?
    • fra-san
      fra-san over 5 years
      I thought you should have posted a new, self-contained question, possibly linking to this one to provide context (only if you felt it substantial enough, of course). But it looks like you solved your issues, so it doesn't mattare anymore.
    • zjeffer
      zjeffer over 5 years
      I reformatted the whole post so other people can more easily find a solution if they have the same problem.
    • Lamp
      Lamp almost 4 years
      This is interesting. I know someone who has a Y520 and the temperatures change bizzarely instantaneously, but in Windows. And under any heavy load the temp just slams againsted the 100c limit and repeatedly triggers PROC HOT (minimizes all frequencies) so temp drops to 60c for a moment then jumps back to 100c for a few seconds. It changes so fast it's hard to see what's going on with 1000ms update interval but with 100ms it's clear. I assume it's because of bad thermal paste but the heatsink screws are stripped. I have to limit cpu freq to like 2.7GHz to avoid PROCHOT.
    • zjeffer
      zjeffer almost 4 years
      @Lamp Yeah so from what I've read that's just classic thermal throttling. I often even have to go down to 2.4GHz (even for light games like Terraria). In Windows, for heavy games I have to lower the max cpu freq as well. Looks like a major issue with this laptop.
    • Lamp
      Lamp almost 4 years
      @zjeffer throttling doesn't make the cpu change from 60c to 100c in a fraction of a second. Actually there seems to be no software throttling at all. I learned about the optional drivers for my lenovo laptop and now I realize, my friend's Y520 is probably missing the intel dynamic platform & thermal framework driver because the hard drive was swapped from a completely different (HP?) computer. This explains why it turbos at max frequency all the time until the CPU's built-in firmware send PROCHOT command to minimize all clock speeds to prevent damage.
    • Lamp
      Lamp almost 4 years
      As I have learned, with the intel DPTF driver it should adjust frequencies to balance the temperature around 70C. You probably had this driver in Windows which was probably pre-installed with your laptop. But clearly linux doesn't have it and intel doesn't make the driver for linux. So I guess you just have to set a fixed frequency limit, or if possible manually set up some software that does what the intel DPTF driver does for windows.
    • Lamp
      Lamp almost 4 years
      I do not believe the software throttling is related to the fact that the temperature changes so quickly though. It seems to me these laptops have a bad thermal contact between the heatsink and chip. But maybe i'm wrong, maybe the chips themselves have a thicker silicon die between the heatsink or something like that, and they are designed to operate with the intel DPTF driver in windows. Anyway, I think I'll add what I've learned as an answer to this queston.
    • Lamp
      Lamp almost 4 years
      I wonder why your CPU temp seemed to change slowly on Windows with DPTF though. With what update interval did you monitor that with? Maybe there's more going on that you would see with a faster update interval. Or maybe DPTF does more than just adjusting frequencies 🤔
    • zjeffer
      zjeffer almost 4 years
      @Lamp tyvm for the info. I stresstested Windows again and got very strange results. I edited the post.
  • K7AAY
    K7AAY over 5 years
    Jeff A, please click edit and put your findings in your original question so all may see them, then return to these comments and delete yours by clicking on the grey (X) after each. Why? Comments pile up and get hidden.