High CPU utilization for threads that seem to be waiting

12,080

As Brendan Gregg points out the blog article you linked, hprof samples from all threads the JVM considers runnable. As you can see in the Javadoc of Thread.state, the JVM distinguishes the following thread states:

  • NEW: A thread that has not yet started is in this state.
  • RUNNABLE: A thread executing in the Java virtual machine is in this state.
  • BLOCKED: A thread that is blocked waiting for a monitor lock is in this state.
  • WAITING: A thread that is waiting indefinitely for another thread to perform a particular action is in this state.
  • TIMED_WAITING: A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
  • TERMINATED: A thread that has exited is in this state.

As we can see, the JVM does not have a dedicated state for a thread waiting for I/O. That's because such a thread is actually blocked by the operating system, not the JVM. That is, as far the JVM is concerned, a thread waiting for the network adapter is runnable. Indeed, the detail Javadoc for the RUNNABLE state writes:

Thread state for a runnable thread. A thread in the runnable state is executing in the Java virtual machine but it may be waiting for other resources from the operating system such as processor.

Therefore, the presence of I/O methods in a hprof "cpu" sampling does not imply that these methods consumed CPU, as their wait time is counted as well.

You can either:

  • assume that I/O methods are not responsible for the CPU consumption, and focus on the other methods
  • use a better profiler that takes into account waiting for OS level resources
Share:
12,080
AHungerArtist
Author by

AHungerArtist

Rubber duckies.

Updated on June 15, 2022

Comments

  • AHungerArtist
    AHungerArtist almost 2 years

    I'm currently running some JMeter tests to test the performance of a web service. It's using a very high amount of CPU. For one JMeter request thread, it's using anywhere from 10-30% (depending on the type of request). When I pump it up to a mere 15 threads, I get ~95% CPU utilization. Naturally, I want to figure out what's going on. I did an Hprof CPU sample (I tried the times option but it took an hour and a half to start my service and no messages would go through). Below are the snippets of the results from that sampling (over a 15 minute span).

    CPU SAMPLES BEGIN (total = 220846) Fri Aug 22 13:38:54 2014
    rank   self  accum   count trace method
       1 14.96% 14.96%   33038 300514 java.net.PlainSocketImpl.socketAccept
       2 14.84% 29.80%   32776 301258 sun.nio.ch.EPollArrayWrapper.epollWait
       3 12.45% 42.26%   27505 313002 sun.nio.ch.EPollArrayWrapper.epollWait
       4  7.48% 49.73%   16517 300604 java.net.PlainSocketImpl.socketAccept
       5  7.18% 56.91%   15856 303203 sun.nio.ch.EPollArrayWrapper.epollWait
       6  6.18% 63.09%   13639 313001 sun.nio.ch.ServerSocketChannelImpl.accept0
       7  6.04% 69.13%   13329 304259 sun.nio.ch.EPoll.epollWait
       8  5.11% 74.23%   11275 307102 sun.nio.ch.EPollArrayWrapper.epollWait
    

    And the corresponding stack for those top samples:

    TRACE 300514:
        java.net.PlainSocketImpl.socketAccept(:Unknown line)
        java.net.AbstractPlainSocketImpl.accept(:Unknown line)
        java.net.ServerSocket.implAccept(:Unknown line)
        java.net.ServerSocket.accept(:Unknown line)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)
        java.lang.Thread.run(:Unknown line)
    TRACE 301258:
        sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
        sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.select(:Unknown line)
        org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:327)
    TRACE 313002:
        sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
        sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.select(:Unknown line)
        org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1163)
        java.lang.Thread.run(:Unknown line)
    TRACE 300604:
        java.net.PlainSocketImpl.socketAccept(:Unknown line)
        java.net.AbstractPlainSocketImpl.accept(:Unknown line)
        java.net.ServerSocket.implAccept(:Unknown line)
        java.net.ServerSocket.accept(:Unknown line)
        sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(:Unknown line)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)
        java.lang.Thread.run(:Unknown line)
    TRACE 303203:
        sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
        sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.select(:Unknown line)
        net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)
        net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)
    TRACE 313001:
        sun.nio.ch.ServerSocketChannelImpl.accept0(:Unknown line)
        sun.nio.ch.ServerSocketChannelImpl.accept(:Unknown line)
        org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:793)
        java.lang.Thread.run(:Unknown line)
    TRACE 304259:
        sun.nio.ch.EPoll.epollWait(:Unknown line)
        sun.nio.ch.EPollPort$EventHandlerTask.poll(:Unknown line)
        sun.nio.ch.EPollPort$EventHandlerTask.run(:Unknown line)
        java.lang.Thread.run(:Unknown line)
    TRACE 307102:
        sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
        sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
        sun.nio.ch.SelectorImpl.select(:Unknown line)
        net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)
        net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)
    

    As you can see, over half of the CPU usage seems to be coming from threads that should be waiting. Shouldn't that not be taking up CPU time?

    I saw this thread http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html which might make me think this result is misleading, but my "top -H" results showed maximum CPU usage, as did Zabbix monitoring. So, it appears that it is actually consuming CPU. However, there is a link there to a quote from the hprof author which states:

    When you have Java threads that are somehow not using the CPU, but managing to stay active, then it will appear as if those stack traces are consuming large amounts of CPU time when they aren't.

    Could someone explain why this might be the case and what I can do to lessen the CPU usage in these cases? Or are all the CPU usage indicators actually misleading? If so, what would be a better way to understand true CPU utilization in my service?

  • AHungerArtist
    AHungerArtist over 9 years
    So what does that mean for the top -H results or the Zabbix results that show ~95% CPU utilization? Do they too not understand that those threads are simply waiting?
  • meriton
    meriton over 9 years
    No, these tools query the operating system, which is aware of whether a thread is executing.
  • Zameer Ansari
    Zameer Ansari over 8 years
    @meriton Can you solve this confusion please