How do I avoid this pickling error, and what is the best way to parallelize this code in Python?

17,628

The reason you are most likely seeing this behavior is because of the order in which you define your pool, objects, and functions. multiprocessing is not quite the same as using threads. Each process will spawn and load a copy of the environment. If you create functions in scopes that may not be available to the processes, or create objects before the pool, then the pool will fail.

First, try creating one pool before your big loop:

(minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
pool = multiprocessing.Pool(processes=numProcessors)
for i in range(minI, maxI, iStep):
    ...

Then, move your target callable outside the dynamic loop:

def functionB(a, b):
    ...

def main():
    ...

Consider this example...

broken

import multiprocessing

def broken():
    vals = [1,2,3]

    def test(x):
        return x

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

broken()
# PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

working

import multiprocessing

def test(x):
    return x

def working():
    vals = [1,2,3]

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

working()
# [1, 2, 3]
Share:
17,628
idealistikz
Author by

idealistikz

Updated on June 12, 2022

Comments

  • idealistikz
    idealistikz almost 2 years

    I have the following code.

    def main():
      (minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
      for i in range(minI, maxI, iStep):
        for j in range(minJ, maxJ, jStep): 
          p = multiprocessing.Process(target=functionA, args=(minI, minJ))
          p.start()
          def functionB((a, b)):
            subprocess.call('program1 %s %s %s %s %s %s' %(c, a, b, 'file1', 
              'file2', 'file3'), shell=True)
            for d in ['a', 'b', 'c']:
              subprocess.call('program2 %s %s %s %s %s' %(d, 'file4', 'file5', 
                'file6', 'file7'), shell=True)
          abProduct = list(itertools.product(range(0, 10), range(0, 10)))
          pool = multiprocessing.Pool(processes=numProcessors)
          pool.map(functionB, abProduct) 
    

    It produces the following error.

    Exception in thread Thread-1:
    Traceback (most recent call last):
      File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
        self.run()
      File "/usr/lib64/python2.6/threading.py", line 484, in run 
        self.__target(*self.__args, **self.__kwargs)
      File "/usr/lib64/python2.6/multiprocessing/pool.py", line 255, in _handle_tasks
        put(task)
    PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function fa
    iled
    

    The contents of functionA are unimportant, and do not produce an error. The error seems to occur when I try to map functionB. How do I remove this error, and what is the best way to parallelize this code in Python 2.6?