Multiprocessing - Pipe vs Queue

python performance queue multiprocessing pipe

97,977

Solution 1

A Pipe() can only have two endpoints.
A Queue() can have multiple producers and consumers.

When to use them

If you need more than two points to communicate, use a Queue().

If you need absolute performance, a Pipe() is much faster because Queue() is built on top of Pipe().

Performance Benchmarking

Let's assume you want to spawn two processes and send messages between them as quickly as possible. These are the timing results of a drag race between similar tests using Pipe() and Queue()... This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2.

FYI, I threw in results for JoinableQueue() as a bonus; JoinableQueue() accounts for tasks when queue.task_done() is called (it doesn't even know about the specific task, it just counts unfinished tasks in the queue), so that queue.join() knows the work is finished.

The code for each at bottom of this answer...

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

In summary Pipe() is about three times faster than a Queue(). Don't even think about the JoinableQueue() unless you really must have the benefits.

BONUS MATERIAL 2

Multiprocessing introduces subtle changes in information flow that make debugging hard unless you know some shortcuts. For instance, you might have a script that works fine when indexing through a dictionary in under many conditions, but infrequently fails with certain inputs.

Normally we get clues to the failure when the entire python process crashes; however, you don't get unsolicited crash tracebacks printed to the console if the multiprocessing function crashes. Tracking down unknown multiprocessing crashes is hard without a clue to what crashed the process.

The simplest way I have found to track down multiprocessing crash informaiton is to wrap the entire multiprocessing function in a try / except and use traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

Now, when you find a crash you see something like:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

Source Code:

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in xrange(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in xrange(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

Solution 2

One additional feature of Queue() that is worth noting is the feeder thread. This section notes "When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe." An infinite number of (or maxsize) items can be inserted into Queue() without any calls to queue.put() blocking. This allows you to store multiple items in a Queue(), until your program is ready to process them.

Pipe(), on the other hand, has a finite amount of storage for items that have been sent to one connection, but have not been received from the other connection. After this storage is used up, calls to connection.send() will block until there is space to write the entire item. This will stall the thread doing the writing until some other thread reads from the pipe. Connection objects give you access to the underlying file descriptor. On *nix systems, you can prevent connection.send() calls from blocking using the os.set_blocking() function. However, this will cause problems if you try to send a single item that does not fit in the pipe's file. Recent versions of Linux allow you to increase the size of a file, but the maximum size allowed varies based on system configurations. You should therefore never rely on Pipe() to buffer data. Calls to connection.send could block until data gets read from the pipe somehwere else.

In conclusion, Queue is a better choice than pipe when you need to buffer data. Even when you only need to communicate between two points.

Solution 3

If - like me - you are wondering whether to use a multiprocessing construct (Pipe or Queue) in your threading programs for performance, I have adapted Mike Pennington's script to compare against queue.Queue and queue.SimpleQueue:

Sending 10000 numbers to mp.Pipe() took 65.051 ms
Sending 10000 numbers to mp.Queue() took 78.977 ms
Sending 10000 numbers to queue.Queue() took 14.781 ms
Sending 10000 numbers to queue.SimpleQueue() took 0.939 ms
Sending 100000 numbers to mp.Pipe() took 449.564 ms
Sending 100000 numbers to mp.Queue() took 811.938 ms
Sending 100000 numbers to queue.Queue() took 149.387 ms
Sending 100000 numbers to queue.SimpleQueue() took 9.264 ms
Sending 1000000 numbers to mp.Pipe() took 4660.451 ms
Sending 1000000 numbers to mp.Queue() took 8499.743 ms
Sending 1000000 numbers to queue.Queue() took 1490.062 ms
Sending 1000000 numbers to queue.SimpleQueue() took 91.238 ms
Sending 10000000 numbers to mp.Pipe() took 45095.935 ms
Sending 10000000 numbers to mp.Queue() took 84829.042 ms
Sending 10000000 numbers to queue.Queue() took 15179.356 ms
Sending 10000000 numbers to queue.SimpleQueue() took 917.562 ms

Unsurprisingly, using the queue package yields much better results if all you have are threads. That said, I was surprised how performant queue.SimpleQueue is.

"""
pipe_performance.py
"""
import threading as td
import queue
import multiprocessing as mp
import multiprocessing.connection as mp_connection
import time
import typing

def reader_pipe(p_out: mp_connection.Connection) -> None:
    while True:
        msg = p_out.recv()
        if msg=='DONE':
            break

def reader_queue(p_queue: queue.Queue[typing.Union[str, int]]) -> None:
    while True:
        msg = p_queue.get()
        if msg=='DONE':
            break

if __name__=='__main__':
    # first: mp.pipe
    for count in [10**4, 10**5, 10**6, 10**7]:
        p_mppipe_out, p_mppipe_in = mp.Pipe()
        reader_p = td.Thread(target=reader_pipe, args=((p_mppipe_out),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_mppipe_in.send(ii)
        p_mppipe_in.send('DONE')
        reader_p.join()
        print(f"Sending {count} numbers to mp.Pipe() took {(time.time() - _start)*1e3:.3f} ms")

    # second: mp.Queue
        p_mpqueue  = mp.Queue()
        reader_p = td.Thread(target=reader_queue, args=((p_mpqueue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_mpqueue.put(ii)
        p_mpqueue.put('DONE')
        reader_p.join()
        print(f"Sending {count} numbers to mp.Queue() took {(time.time() - _start)*1e3:.3f} ms")

    # third: queue.Queue
        p_queue = queue.Queue()
        reader_p = td.Thread(target=reader_queue, args=((p_queue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_queue.put(ii)
        p_queue.put('DONE')
        reader_p.join()
        print(f"Sending {count} numbers to queue.Queue() took {(time.time() - _start)*1e3:.3f} ms")

    # fourth: queue.SimpleQueue
        p_squeue = queue.SimpleQueue()
        reader_p = td.Thread(target=reader_queue, args=((p_squeue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_squeue.put(ii)
        p_squeue.put('DONE')
        reader_p.join()
        print(f"Sending {count} numbers to queue.SimpleQueue() took {(time.time() - _start)*1e3:.3f} ms")

97,977

Jonathan Livni

python, django, C++ and other vegetables...

Updated on March 04, 2022

Comments

Jonathan Livni over 2 years

What are the fundamental differences between queues and pipes in Python's multiprocessing package?

In what scenarios should one choose one over the other? When is it advantageous to use Pipe()? When is it advantageous to use Queue()?
James Brady over 12 years

@Jonathan "In summary Pipe() is about three times faster than a Queue()"
Seun Osewa over 12 years

But Pipe() cannot safely be used with multiple producers/consumers.
JJC about 12 years

Excellent! Good answer and nice that you provided benchmarks! I only have two tiny quibbles: (1) "orders of magnitude faster" is a bit of an overstatement. The difference is x3, which is about a third of one order of magnitude. Just saying. ;-); and (2) a more fair comparison would be running N workers, each communicating with main thread via point-to-point pipe compared to performance of running N workers all pulling from a single point-to-multipoint queue.
travc over 11 years

To your "Bonus Material"... Yeah. If you're subclassing Process, put the bulk of the 'run' method in a try block. That is also a useful way to do logging of exceptions. To replicate the normal exception output: sys.stderr.write(''.join(traceback.format_exception(*(sys.ex‌c_info()))))
alexpinho98 about 11 years

Wouldn't it be better to send error messages through the pipe to the other process and handle errors in the other process?
scytale almost 11 years

@alexpinho98 - but you're going to need some out-of-band data, and associated signalling mode, to indicate that what you're sending is not regular data but error data. seeing as the originating process is already in an unpredictable state this may be too much to ask.
Will almost 11 years

@Mike, Just wanted to say you're awesome. This answer helped me a lot.
jab almost 11 years

@JJC To quibble with your quibble, 3x is about half an order of magnitude, not a third -- sqrt(10) =~ 3.
ideoutrea almost 6 years

In the multi-pipe.py, how can you know the pipe will be put all the items before the inp_p.close is called.
Mike Pennington almost 6 years

@ideoutrea, agreed explicit is better than implicit
MC-8 over 4 years

In my tests, sending small packages (up to ~500 integer values per "send/put" "recv/get") is faster when using Queue than when using Pipe (even unidirectional with duplex=False). So if you want to go for absolute performance, check with a representative data size before using one or the other. As an example, sending messages with just 3 int is about ~40% faster with Queue.
Anab almost 4 years

The section you link makes a note about a feeder thread, but the documentation of the put method still declares it a blocking or failing method: "If the optional argument block is True (the default) and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the queue.Full exception if no free slot was available within that time." Are you sure about your answer?
akindofyoga almost 4 years

I am sure about my answer. The put method will block if the maxsize parameter to the constructor of Queue is specified. But this will be because of the number of items in the queue, not the size of individual items.
Anab almost 4 years

Thanks for the clarification, I had missed that part.
Marshies over 3 years

@MikePennington regarding this line in the multi-pipe code: p_output.close() # We no longer need this part of the Pipe() why is it ok to close the ouput end of the pipe? Do we have to do this?
nijave about 3 years

Don't see this mentioned--Queue uses pickle to convert data to binary before sending it over the pipe. Depending on what you're sending, it may be better to avoid pickle and handle the conversion yourself (for instance, if you're sending a bunch of text back and forth very quickly)