Java performance problem with LinkedBlockingQueue

31,312

Solution 1

Your producer thread simply puts more elements than the consumer consumes, so the queue eventually hits its capacity limit, thus the producer waits.

Consolidating my original answer since now we have basically the full picture:

  • You hit the inherent throughput limit of the LinkedBlockingQueue (every queue has one) by doing extremely fast put()s, where even continual take()s, with zero further processing, cannot keep up. (By the way this shows that in this structure, on your JVM and machine anyway, put()s are at least slightly more costly than the reads).
  • Since there is a particular lock that consumers lock, putting more consumer threads could not possibly help (if your consumer was actually doing some processing and that was bounding the throughput, then adding more consumers would help. There are better queue implementations for a scenario with more than one consumers (or producers), you could try SynchronousQueue, ConcurrentLinkedQueue, and the upcoming TransferQueue of jsr166y).

Some suggestions:

  • Try to make more coarse-grained objects, so that the overhead of queueing each is balanced with the actual work that is offloaded from the producing thread (in your case, it seems you create much communication overhead for objects that represent negligible amounts of work)
  • You could also have the producer help the consumer by offloading some consuming work (not much point in waiting idly when there is work to be done).

/updated after John W. rightly pointed out my original answer was misleading

Solution 2

I would generally recommend not using a LinkedBlockingQueue in a performance sensitive area of code, use an ArrayBlockingQueue. It will giving a much nicer garbage collection profile and is more cache friendly than the LinkedBlockingQueue.

Try the ArrayBlockingQueue and measure the performance.

The only advantage of the LinkedBlockingQueue is that it can be unbounded, however this is rarely what you actually want. If you have a case where a consumer fails and queues start backing up, having bounded queues allows the system to degrade gracefully rather risk OutOfMemoryErrors that may occur if queues are unbounded.

Solution 3

Here are a couple of things to try:

Replace the LinkedBlockingQueue with an ArrayBlockingQueue. It has no dangling references and so is better behaved when the queue fills up. Specifically, given the 1.6 implementation of LinkedBlockingQueue, full GC of the elements will not happen until the queue actually becomes empty.

If the producer side is consistently out performing the consumer side, consider using drain or drainTo to perform a "bulk" take operation.

Alternatively, have the queue take arrays or Lists of message objects. The the producer fills a List or array with message objects and each put or take moves multiple messages with the same locking overhead. Think of it as a secretary handing you a stack of "While you were out" messages vs. handing them to you one at a time.

Solution 4

It's hard to say what happens without knowing something about the filling process.

If addMyMessage is called less frequently - perhaps because of a performance problem in a whole different part of your application - the take method has to wait.

That way it looks like take is the culprit, but actually it's the filling part of your application.

Solution 5

Found this interesting post about performance problems due to queue size and garbage collection.

Share:
31,312
lofthouses
Author by

lofthouses

Updated on July 23, 2020

Comments

  • lofthouses
    lofthouses almost 4 years

    this is my first post on stackoverflow... I hope someone can help me

    I have a big performance regression with Java 6 LinkedBlockingQueue. In the first thread i generate some objects which i push in to the queue In the second thread i pull these objects out. The performance regression occurs when the take() method of the LinkedBlockingQueue is called frequently. I monitored the whole program and the take() method claimed the most time overall. And the throughput goes from ~58Mb/s to 0.9Mb/s...

    the queue pop and take methods ar called with a static method from this class

    public class C_myMessageQueue {
    
        private static final LinkedBlockingQueue<C_myMessageObject> x_queue = new LinkedBlockingQueue<C_myMessageObject>( 50000 );
    
        /**
         * @param message
         * @throws InterruptedException
         * @throws NullPointerException
         */
        public static void addMyMessage( C_myMessageObject message )
                throws InterruptedException, NullPointerException {
            x_queue.put( message );
        }
    
        /**
         * @return Die erste message der MesseageQueue
         * @throws InterruptedException
         */
        public static C_myMessageObject getMyMessage() throws InterruptedException {
            return x_queue.take();
        }
    }
    

    how can I tune the take() method to accomplish at least 25Mb/s, or is there a other class I can use which will block when the "queue" is full or empty.

    kind regards

    Bart

    P.S.: sorry for my bad english, I'm from Germany ;)

  • lofthouses
    lofthouses about 14 years
    The filling thread is not the culprit, the filling thread wait on the consumer thread when the throughput goes down (seen with the visualvm profiler). when the filling thread had not to wait on the consumer the throughput is realy good, at least 58Mb/s. the consumer thread waits on the take method, the operations in the consumer task ar minimal... i tried also the ArrayBlockingQueue...the same game, the consumer wait for the slow take method, and the filler wait because the queue is full p.s.: the program is a copy task p.p.s: thank u for ur answer ;)
  • lofthouses
    lofthouses about 14 years
    Thank you for the clue, but it changed nothing... take() takes 96.7% of time in the consumer thread. in my opinion the take method is to slow...
  • Daniel Rikowski
    Daniel Rikowski about 14 years
    Perhaps it helps if you look at the implementation of that method. I could not find anything weird in there.
  • lofthouses
    lofthouses about 14 years
    You're rigth, but there is only a minimal performance difference between arrayBQ and linkedBQ. and the regression still resists
  • lofthouses
    lofthouses about 14 years
    that will be my last try... and then i will play with javolution or something like that... or is this a bad idea?
  • lofthouses
    lofthouses about 14 years
    damn, why i didn't try it my self... so i will try it now :)
  • lofthouses
    lofthouses about 14 years
    i could not find anything to, but the fact is that take took most of time in this thread
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    I think he means a single-producer, single-consumer scenario, so fairness should be irrelevant. Curious that LinkedBlockingQueue doesn't expose a fair parameter though.
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    I doubt the drainTo() would help. The consumer's problem is not contention for the consumer's lock (it's only a single consumer thread, taking the consumer lock is uncontended and /fast/). The fundamental issue is that the consumer is slow, less data flow through it, so the producer, no matter what (without an overflowing queue that is), is going to be bounded by that less flow - see my answer too.
  • lofthouses
    lofthouses about 14 years
    I dont belive that a StringBuilder.append( " ----- " ); in the consumer thread is so much slower than a lot of arithmetic operations in the producer thread
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    Please clarify the size of the queue when the problem occurs. That's the only way we can be sure whether it's the producer or the consumer that is slow. String operations can be slow too, don't depend on such assumptions, just when you notice the problem, just check whether the queue is nearly empty or nearly full.
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    (I suspect though that you create very fine-grained objects unnecessarily. You might also try bundling some of those in bigger packs to decrease queue overhead, if needed).
  • lofthouses
    lofthouses about 14 years
    I tried several capacitys for the queue, but the same problems occured with 20 or 50000 elements... at the moment i try the approach to transfer lists of objects with the queue.
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    "The same problems": please clarify the size of the queue when said problems occur.
  • lofthouses
    lofthouses about 14 years
    i disabled all operations in the consumer thread, it takes only the objects from the queue, and the problems resist (the producer thread has to wait) :(
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    Ok, the producer waits, so the queue is full, as was my initial impression. Well, your producer creates a huge amount of objects, those arithmetic operations you mentioned aren't slow enough. You can try having more readers, as per my original answer, or create more coarse grained objects.
  • jezg1993
    jezg1993 about 14 years
    It sounds like if the take (consumer) is taking alot of time then the queue is empty, no? He would want to increase the number of producers
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    If someone tells you "take() is taking a long time", yes, you assume the queue is empty, thus you need more producers. But in fact it is full! The consumer that only ever calls take() is still too slow to keep up with that producer! The throughput of a LinkedBlockingQueue (as in any other queue) is limited by the minimum time that "take()" can take. At a stable condition, if you can't take more than X objects per time unit out, you can't put more than X elements (or the queue is unbounded). His producer creates more than X elements, thus it hits the inherent throughput limit of the queue.
  • jezg1993
    jezg1993 about 14 years
    Thats true, I just dont see how the queue being full and producers waiting has anything to do with the take spending alot of time processing. Maybe thats your point but then your answer missleads the op to think the speed decrease by take is affected by the producers.
  • Dimitris Andreou
    Dimitris Andreou about 14 years
    You are totally right, I have to modify my original answer. As you can see from the comments, the problem was rather unclear initially - my assumption had been that the consumer (and not just the take()) was slow, only later it was apparent that the bottleneck was occurring even when there was no additional processing on the consumer.
  • Amrish Pandey
    Amrish Pandey over 9 years
    ArrayBlockingQueue will be slower than LinkedBlockingQueue as it uses single lock for put() and take() whereas LinkedBlockingQueue uses 2 locks for put() and take() and synchronizes producer and consumer only when queue is empty / full.