Kworker is at 100% - I think I've tried everything!

11,210

Solution 1

This question seems abandoned as it wasn't updated anymore, but I'll give it a try anyway: I have seen quite some where excessive interrupts occurred, slowing the machine down. This could be verified with grep . -r /sys/firmware/acpi/interrupts/.

Related: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/887793 https://bugzilla.kernel.org/show_bug.cgi?id=53071 https://forum.ubuntuusers.de/topic/kworker-cpu-load/ (German)

Solution 2

I believe that "Permission denied" error is easily solvable by manipulating /sys/kernel/debug/tracing/set_event as root.

Another idea is to figure whether your issue is there right from the start, or something triggers it. In the first case, the naive approach would be to boot with most drivers disabled and then re-enable them one by one to find the culprit.

In case something triggers the issue, we'll have to know what it is. I have seen cases where excessive CPU usage was triggered by a spike in disk IO, and tuning /proc/sys/vm/ parameters related to caching helped a lot.

Share:
11,210

Related videos on Youtube

jonathanbsyd
Author by

jonathanbsyd

Updated on September 18, 2022

Comments

  • jonathanbsyd
    jonathanbsyd over 1 year

    Thanks for checking this one out.

    jonathan@melange:~$ top
    
    top - 05:21:08 up 44 min,  2 users,  load average: 1.21, 1.68, 1.98
    Tasks: 351 total,   2 running, 349 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  4.3 us, 14.0 sy,  2.1 ni, 70.4 id,  8.9 wa,  0.0 hi,  0.3 si,  0.0 st
    GiB Mem :   15.579 total,    0.173 free,    4.141 used,   11.264 buff/cache
    GiB Swap:   15.910 total,   15.868 free,    0.042 used.   11.014 avail Mem 
    
      PID  PPID   UID USER     RUSER    TTY          TIME+  %CPU %MEM S COMMAND                                                                                                                                   
       67     2     0 root     root     ?         22:22.40 100.0  0.0 R kworker/0:1 
    

    The setup - ubuntu 16.10. 4.8.0-41-generic. Modern intel based laptop with Nvidia drivers and not quite perfect wifi. Let me know and I can provide you with whatever info you need. I have these working acceptably and I don't see any reason to believe these are involved in this issue.

    I've actually already asked this on askubuntu & a couple of times over at Freenode` #ubuntu over the last week but no one will even respond to my question :(

    I've taken some perf reports with

    sudo perf record -a -g sleep 10
    sudo perf report
    

    With some results

    Samples: 92K of event 'cycles:ppp', Event count (approx.): 58330337004406                                                                                                                                     
      Children      Self  Command          Shared Object                        Symbol                                                                                                                           ◆
    +   94.27%     0.00%  swapper          [kernel.kallsyms]                    [k] cpu_startup_entry                                                                                                            ▒
    +   94.27%     0.00%  swapper          [kernel.kallsyms]                    [k] start_secondary                                                                                                              ▒
    +   77.29%     0.00%  swapper          [kernel.kallsyms]                    [k] schedule_preempt_disabled                                                                                                    ▒
    -   77.29%    77.29%  swapper          [kernel.kallsyms]                    [k] __schedule                                                                                                                   ▒
         77.29% start_secondary                                                                                                                                                                                  ▒
            cpu_startup_entry                                                                                                                                                                                    ▒
          - schedule_preempt_disabled                                                                                                                                                                            ▒
             - 77.29% schedule                                                                                                                                                                                   ▒
                  __schedule                                                                                                                                                                                     ▒
    +   77.29%     0.00%  swapper          [kernel.kallsyms]                    [k] schedule                                                                                                                     ▒
    +   16.99%     0.00%  swapper          [kernel.kallsyms]                    [k] call_cpuidle                                                                                                                 ▒
    +   16.99%     0.00%  swapper          [kernel.kallsyms]                    [k] cpuidle_enter                                                                                                                ▒
    +   16.99%     0.00%  swapper          [kernel.kallsyms]                    [k] cpuidle_enter_state                                                                                                          ▒
    -   16.99%    16.99%  swapper          [kernel.kallsyms]                    [k] intel_idle                                                                                                                   ▒
         16.98% start_secondary                                                                                                                                                                                  ▒
            cpu_startup_entry                                                                                                                                                                                    ▒
            call_cpuidle                                                                                                                                                                                         ▒
          - cpuidle_enter                                                                                                                                                                                        ▒
             - 16.98% cpuidle_enter_state                                                                                                                                                                        ▒
                  intel_idle                                                                                                                                                                                     ▒
    +    5.65%     0.00%  pool             [unknown]                            [.] 0000000000000000                                                                                                             ▒
    +    5.65%     5.65%  pool             libc-2.24.so                         [.] re_compile_internal                                                                                                          ▒
    +    5.65%     0.00%  pool             [unknown]                            [.] 0x00007f049804d628                                                                                                           ▒
    +    5.65%     0.00%  pool             [unknown]                            [.] 0x00007f049804d6a8                                                                                                           ▒
    +    5.65%     0.00%  pool             [unknown]                            [.] 0x00007f049804d3d8                                                                                                           ▒
    +    5.65%     0.00%  pool             [unknown]                            [.] 0x00007f049804d768                                                                                                           ▒
    Cannot load tips.txt file, please install perf!
    

    I've checked dmesg, over heating messages (thats why I'm here) and some other messages about MSFT0101:00 which I believe is something todo with the kernel not recognising my bios enabled TPM module. I think that this should be insignificant in this matter.

    There is another question about kworker threads suggesting the following as per this thread

    $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
    (wait a few secs)
    ^C
    

    but it doesn't work!

    jonathan@melange:~$ sudo mount -t debugfs nodev /sys/kernel/debug
    mount: nodev is already mounted or /sys/kernel/debug busy
    jonathan@melange:~$ sudo echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
    bash: /sys/kernel/debug/tracing/set_event: Permission denied
    jonathan@melange:~$ sudo cat /proc/67/stack
    [<ffffffffffffffff>] 0xffffffffffffffff
    

    Any ideas?!

    Update

    Before submitting this question I had been using Kworker, what is it and why is it hogging so much CPU? as reference. So I had tried disabling/uninstalling long running processes such as dropbox, insync (google drive), crashplan, keybase, Variety background, multiload indicator, psensor, guake. (I feel like I have a pretty slick setup most of the time...) but nothing seemed to help.

    There had been other questions lurking around suggesting malfunctioning wifi, nvidia drivers or usb drivers. But nothing in my logs were suggesting this either. Somewhat thankful as almost always the solution in those was simply find newer nvidia drivers, update the kernel, or "Deal with it." My laptop is pretty up to date already, I have no enterprise reason to stay on 16.04 and I already have the nvidia ppa activated, as with the intel drivers, so this wasn't much help.

    Perhaps the kworker was actually the result of the laptop overheating -> cpu throttling + cpu fan management. Not the cause. As suggested by Stop cpu from overheating So I've just used some compressed air to clean out the fans (didn't think this would be a problem on a laptop only 9 months old yet there was actually a bit of dust) and investigating the thermal-conf.xml which suggests that the fan kicks in at 55°C (although still working on what I can do here)

    Thinking this may actually be the solution. Will report back soon.

    Update 2

    So doing the Acer bios update totally ruined everything related to my secureboot setup and corrupted the the efi files so it took me a few days to work out how to regenerate the ubuntu efi keys and the and windows efi keys.

    I tried cleaning out the dust, and it definitely helped for the two days until I started with the bios issues.

    But the kworker is back (and yes it is the same as far as I can tell). I also have some more information now. I can see that the cpu is not throttling down, but rather staying at the maximum. The fan is running, but the device is only sitting around the 60degree mark, so i wouldn't call this serious over heating.

    The commands from the other thread require raising to the root user, not just using sudo. so sudo su and then getting the stack trace gives the following.

    [<ffffffff98a9dcea>] worker_thread+0xca/0x500
    [<ffffffff98aa40d8>] kthread+0xd8/0xf0
    [<ffffffff992a071f>] ret_from_fork+0x1f/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff
    

    Doesn't look particularly helpful to me.

    Long time later....

    I see this answer is still getting lots of views so I thought I'd add in what I remember of what else happened. I run an Aspire V 15 Nitro 592G laptop with Nvidia gpu. The wifi is flaky, the mic doesn't work, Nvidia drivers cause the gnome shell and monitors to repeatedly crash and more. This isn't the best ubuntu machine even though it's pretty powerful when it is working. To be honest, I now run Ubuntu 17.10, and I still have major issues making this machine work.

    I wrote that cleaning the fan seemed to help in the comments. It certainly made things quieter. But I suspect it was actually a combination of the following:

    • Tracker (the full text search daemon) -> crazy resource hog
    • The wifi drivers at the time were just awful
    • Variety (wallpaper switcher) + dual monitor + nvidia + gnome shell -> unreliable monitor setup and massive ram leak on gnome shell,
    • the multiload shell extension also a memory leak

    I realise this doesn't help new users with different problems. Perhaps one day things will be easier to diagnose, until then; good luck!

    • Admin
      Admin about 7 years
      Did you install the intel-microcode package and checked if there is a bios update for your notebook?
    • Admin
      Admin about 7 years
      You know I think just cleaning the fans may have done the trick - somehow when the fans effectively did the cooling things went back to normal. But I'm not sure because doing the bios update has totally nuked all my boot options and I've been running off a live usb for the last few days trying to fix that. Acer secure boot issues.
    • Admin
      Admin about 7 years
      What was the kernel version? Also, on previous occasions I've found mcelog useful when attempting to distinguish between software and hardware issues.
    • Admin
      Admin about 7 years
      Can you plz check if your machine has excessive interrupts: grep . -r /sys/firmware/acpi/interrupts/?
    • Admin
      Admin about 7 years
      Excessive interrupts may be the cause - please check: grep . -r /sys/firmware/acpi/interrupts/ and update your question accordingly.
    • Admin
      Admin over 3 years
      I had a kworker at 100% and in my case this solution fixed the issue: askubuntu.com/questions/1044872/… -- I think the kernel should do a better job at reporting what it's doing... I had not seen any message pointing me to the USB ports being potentially generating an issue.