How to use multiprocessing pool.map with multiple arguments

812,043

Solution 1

The answer to this is version- and situation-dependent. The most general answer for recent versions of Python (since 3.3) was first described below by J.F. Sebastian.1 It uses the Pool.starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function:

import multiprocessing
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap(merge_names, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...

For earlier versions of Python, you'll need to write a helper function to unpack the arguments explicitly. If you want to use with, you'll also need to write a wrapper to turn Pool into a context manager. (Thanks to muon for pointing this out.)

import multiprocessing
from itertools import product
from contextlib import contextmanager

def merge_names(a, b):
    return '{} & {}'.format(a, b)

def merge_names_unpack(args):
    return merge_names(*args)

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(merge_names_unpack, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...

In simpler cases, with a fixed second argument, you can also use partial, but only in Python 2.7+.

import multiprocessing
from functools import partial
from contextlib import contextmanager

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(partial(merge_names, b='Sons'), names)
    print(results)

# Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ...

1. Much of this was inspired by his answer, which should probably have been accepted instead. But since this one is stuck at the top, it seemed best to improve it for future readers.

Solution 2

is there a variant of pool.map which support multiple arguments?

Python 3.3 includes pool.starmap() method:

#!/usr/bin/env python3
from functools import partial
from itertools import repeat
from multiprocessing import Pool, freeze_support

def func(a, b):
    return a + b

def main():
    a_args = [1,2,3]
    second_arg = 1
    with Pool() as pool:
        L = pool.starmap(func, [(1, 1), (2, 1), (3, 1)])
        M = pool.starmap(func, zip(a_args, repeat(second_arg)))
        N = pool.map(partial(func, b=second_arg), a_args)
        assert L == M == N

if __name__=="__main__":
    freeze_support()
    main()

For older versions:

#!/usr/bin/env python2
import itertools
from multiprocessing import Pool, freeze_support

def func(a, b):
    print a, b

def func_star(a_b):
    """Convert `f([1,2])` to `f(1,2)` call."""
    return func(*a_b)

def main():
    pool = Pool()
    a_args = [1,2,3]
    second_arg = 1
    pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))

if __name__=="__main__":
    freeze_support()
    main()

Output

1 1
2 1
3 1

Notice how itertools.izip() and itertools.repeat() are used here.

Due to the bug mentioned by @unutbu you can't use functools.partial() or similar capabilities on Python 2.6, so the simple wrapper function func_star() should be defined explicitly. See also the workaround suggested by uptimebox.

Solution 3

I think the below will be better:

def multi_run_wrapper(args):
   return add(*args)

def add(x,y):
    return x+y

if __name__ == "__main__":
    from multiprocessing import Pool
    pool = Pool(4)
    results = pool.map(multi_run_wrapper,[(1,2),(2,3),(3,4)])
    print results

Output

[3, 5, 7]

Solution 4

Using Python 3.3+ with pool.starmap():

from multiprocessing.dummy import Pool as ThreadPool 

def write(i, x):
    print(i, "---", x)

a = ["1","2","3"]
b = ["4","5","6"] 

pool = ThreadPool(2)
pool.starmap(write, zip(a,b)) 
pool.close() 
pool.join()

Result:

1 --- 4
2 --- 5
3 --- 6

You can also zip() more arguments if you like: zip(a,b,c,d,e)

In case you want to have a constant value passed as an argument:

import itertools

zip(itertools.repeat(constant), a)

In case your function should return something:

results = pool.starmap(write, zip(a,b))

This gives a List with the returned values.

Solution 5

How to take multiple arguments:

def f1(args):
    a, b, c = args[0] , args[1] , args[2]
    return a+b+c

if __name__ == "__main__":
    import multiprocessing
    pool = multiprocessing.Pool(4) 

    result1 = pool.map(f1, [ [1,2,3] ])
    print(result1)
Share:
812,043
user642897
Author by

user642897

Updated on December 16, 2021

Comments

  • user642897
    user642897 over 2 years

    In the Python multiprocessing library, is there a variant of pool.map which supports multiple arguments?

    import multiprocessing
    
    text = "test"
    
    def harvester(text, case):
        X = case[0]
        text + str(X)
    
    if __name__ == '__main__':
        pool = multiprocessing.Pool(processes=6)
        case = RAW_DATASET
        pool.map(harvester(text, case), case, 1)
        pool.close()
        pool.join()
    
    • senderle
      senderle about 13 years
      To my surprise, I could make neither partial nor lambda do this. I think it has to do with the strange way that functions are passed to the subprocesses (via pickle).
    • unutbu
      unutbu about 13 years
      @senderle: This is a bug in Python 2.6, but has been fixed as of 2.7: bugs.python.org/issue5228
    • Tung Nguyen
      Tung Nguyen almost 8 years
      Just simply replace pool.map(harvester(text,case),case, 1) by: pool.apply_async(harvester(text,case),case, 1)
    • Ricalsin
      Ricalsin about 7 years
      @Syrtis_Major , please don't edit OP questions which effectively skew answers that have been previously given. Adding return to harvester() turned @senderie 's response into being inaccurate. That does not help future readers.
    • H S Rathore
      H S Rathore over 4 years
      I would say easy solution would be to pack all the args in a tuple and unpack it in the executing func. I did this when I needed to send complicated multiple args to a func being executed by a pool of processes.
    • John Curry
      John Curry over 3 years
      Maybe there is some complexity I am missing for this particular use case but partial works for my similar use case and is very succint and easy to use. python.omics.wiki/multiprocessing_map/…
  • Björn Pollex
    Björn Pollex about 13 years
    F.: You can unpack the argument tuple in the signature of func_star like this: def func_star((a, b)). Of course, this only works for a fixed number of arguments, but if that is the only case he has, it is more readable.
  • jfs
    jfs about 13 years
    @Space_C0wb0y: f((a,b)) syntax is deprecated and removed in py3k. And it is unnecessary here.
  • xgdgsc
    xgdgsc over 10 years
    It seems to me that RAW_DATASET in this case should be a global variable? While I want the partial_harvester change the value of case in every call of harvester(). How to achieve that?
  • Mike McKerns
    Mike McKerns about 9 years
    This is a near exact duplicate answer as the one from @J.F.Sebastian in 2011 (with 60+ votes).
  • user136036
    user136036 about 9 years
    No. First of all it removed lots of unnecessary stuff and clearly states it's for python 3.3+ and is intended for beginners that look for a simple and clean answer. As a beginner myself it took some time to figure it out that way (yes with JFSebastians posts) and this is why I wrote my post to help other beginners, because his post simply said "there is starmap" but did not explain it - this is what my post intends. So there is absolutely no reason to bash me with two downvotes.
  • dylam
    dylam almost 9 years
    perhaps more pythonic: func = lambda x: func(*x) instead of defining a wrapper function
  • jfs
    jfs almost 9 years
    @dylam: read the last paragraph in the answer or try your suggestion on Python 2.6 (it fails)
  • WeizhongTu
    WeizhongTu over 8 years
    This is an easy way, but you need to change your original functions. What's more, some time recall others' functions which may can't be modified.
  • nehem
    nehem over 8 years
    I will say this sticks to Python zen. There should be one and only one obvious way to do it. If by chance you are the author of the calling function, this you should use this method, for other cases we can use imotai's method.
  • nehem
    nehem over 8 years
    My choice is to use a tuple, And then immediately unwrap them as the first thing in the first line.
  • Emerson Xu
    Emerson Xu almost 8 years
    The most important thing here is assigning =RAW_DATASET default value to case. Otherwise pool.map will confuse about the multiple arguments.
  • Dave
    Dave over 7 years
    I'm confused, what happened to the text variable in your example? Why is RAW_DATASET seemingly passed twice. I think you might have a typo?
  • zthomas.nc
    zthomas.nc over 7 years
    So ... the above doesn't work if you are calling a class function within a class (wants self passed as an argument?)
  • jfs
    jfs over 7 years
    @zthomas.nc this question is about how to support multiple arguments for multiprocessing pool.map. If want to know how to call a method instead of a function in a different Python process via multiprocessing then ask a separate question (if all else fails, you could always create a global function that wraps the method call similar to func_star() above)
  • Ahmed
    Ahmed over 7 years
    Easiest solution. There is a small optimization; remove the wrapper function and unpack args directly in add, it works for any number of arguments: def add(args): (x,y) = args
  • Andre Holzner
    Andre Holzner about 7 years
    you could also use a lambda function instead of defining multi_run_wrapper(..)
  • Andre Holzner
    Andre Holzner about 7 years
    hm... in fact, using a lambda does not work because pool.map(..) tries to pickle the given function
  • muon
    muon over 6 years
    not sure why using with .. as .. gives me AttributeError: __exit__, but works fine if i just call pool = Pool(); then close manually pool.close() (python2.7)
  • senderle
    senderle over 6 years
    @muon, good catch. It appears Pool objects don't become context managers until Python 3.3. I've added a simple wrapper function that returns a Pool context manager.
  • machen
    machen over 6 years
    @muon How to use call_back in pool.starmap
  • machen
    machen over 6 years
    Does this starmap support generator function which yield infinite sequence?
  • machen
    machen over 6 years
    Does this starmap support generator function which yield infinite sequence
  • senderle
    senderle over 6 years
    @machen, it depends on what you mean. But I wouldn't recommend using infinite generators with multiprocessing unless they are paired with finite generators. For example, you could probably do something like pool.starmap(twoarg_func, zip(finite, infinite)). It's possible that pool.imap and pool.imap_unordered could tolerate infinite generators but that still sounds like a pretty bad idea to me.
  • jfs
    jfs over 6 years
    @machen starmap supports generators (such as zip() above). It returns a list and therefore you shouldn't pass it an infinite generator (it will just consume all memory)
  • Fábio Dias
    Fábio Dias about 6 years
    Indeed it does, still looking for a better way :(
  • Amir
    Amir about 6 years
    @jfs This is a bit unrelated but I want to run a function that does not take any arguments in the background but I have some resource limitations and cannot run the function as many times that I want and want to queue the extra executions of the function. Do you have any idea on how I should do that? I have my question here. Could you please take a look at my question and see if you can give me some hints (or even better, an answer) on how I should do that?
  • Tedo Vrbanec
    Tedo Vrbanec over 5 years
    Results are not as expected: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] I would expect: [0,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,10,2,3,4,5,6,7,8,9,10‌​,11, ...
  • Syrtis Major
    Syrtis Major over 5 years
    @TedoVrbanec Results just should be [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]. If you want the later one, you may use itertools.product instead of zip.
  • Константин Ван
    Константин Ван about 5 years
    I wish there were starstarmap.
  • jfs
    jfs about 5 years
    @КонстантинВан starstar is to accept an iterable of dicts with parameters?
  • Константин Ван
    Константин Ван about 5 years
    @jfs Right. Keyword arguments.
  • ScipioAfricanus
    ScipioAfricanus about 5 years
    Does the order of arguments in functions call?
  • Prav001
    Prav001 over 4 years
    Neat and elegant.
  • as - if
    as - if over 4 years
    why b=233. defeats the purpose of the question
  • Vivek Subramanian
    Vivek Subramanian over 4 years
    How do you use this if you want to store the result of add in a list?
  • Michael Dorner
    Michael Dorner over 4 years
    @Ahmed I like it how it is, because IMHO the method call should fail, whenever the number of parameter is not correct.
  • Scott
    Scott about 4 years
    I want to note that this doesn't address the structure in the original question. [[1,2,3], [4,5,6]] would unpack with starmap to [pow(1,2,3), pow(4,5,6)], not [pow(1,4), pow(2,5), pow(3, 6)]. If you don't have good control over the inputs being passed to to your function, you may need to restructure them first.
  • Mike McKerns
    Mike McKerns about 4 years
    @Scott: ah, I didn't notice that... over 5 years ago. I'll make a small update. Thanks.
  • toti
    toti almost 4 years
    I don't understand why I have to scroll all the way over here to find the best answer.
  • pauljohn32
    pauljohn32 almost 4 years
    Should zip input vectors. More understandable than transposing and array, don't you think?
  • Mike McKerns
    Mike McKerns almost 4 years
    The array transpose, while possibly less clear, should be less expensive.
  • Hammad
    Hammad over 2 years
    This answer should literally have been at the top most.
  • Michael Silverstein
    Michael Silverstein over 2 years
    You mean c instead of case here, right?: res = pool.apply_async(harvester, (text, case, q = None))
  • Peter Mortensen
    Peter Mortensen over 2 years
    An explanation would be in order. E.g., what is the idea/gist? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
  • Peter Mortensen
    Peter Mortensen over 2 years
    Still, an explanation would be in order. E.g., what is the idea/gist? What languages features does it use and why? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
  • Peter Mortensen
    Peter Mortensen over 2 years
    What do you mean by "a list lists of arguments" (seems incomprehensible)? Preferably, please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
  • Sean William
    Sean William about 2 years
    please add pool.close() and pool.join() after getting results = pool.map(...), else this might possibly runs forever
  • root-11
    root-11 almost 2 years
    starmap was the answer I was looking for.