A task failed to un-serialize

13,751

Solution 1

This always happens when using multiprocessing in an iPython console in Spyder. A workaround is to run the script from the command line instead.

Solution 2

Just posting this for others, in case it's helpful. I Ran into the same issue today running a GridSearchCV on a Dask array / cluster. Sklearn v.0.24

Solved it by using the joblib context manager as described here: https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism

Solution 3

I get this error as well, but only on Windows. I am using joblib to run a function (call it func_x) in parallel. That function is imported from a module, let's call it module_a.

module_a also uses a function (call it func_y) from another module, module_b, which it imports using the syntax import module_b.

I found that I can avoid the BrokenProcessPool error if I edit module_a and change the import line to from module_b import func_y.

I also had to remove the if __name__ == '__main__:' from the main script which was importing module_a.

I think this subtle difference in how modules are imported to the namespace determines whether that module can then be pickled by joblib for parallel processing in Windows.

I hope this helps!

--

A minimal reproducible example is below:

Original main.py

from joblib import Parallel, delayed
import module_a

if __name__ == '__main__':
    Parallel(n_jobs=4, verbose=3)(delayed(module_a.func_x)(i) for i in range(50))

Original module_a.py (fails on Windows with BrokenProcessPool error; kernel restart required)

import module_b

def func_x(i):
    j = i ** 3
    k = module_b.func_y(j)
    return k

Edited main.py

from joblib import Parallel, delayed
import module_a

Parallel(n_jobs=4, verbose=3)(delayed(module_a.func_x)(i) for i in range(50))

Edited module_a.py (succeeds on Windows)

from module_b import func_y # changed

def func_x(i):
    j = i ** 3
    k = func_y(j) # changed
    return k

module_b.py

def func_y(m):
    k = j ** 3
    return k
Share:
13,751

Related videos on Youtube

Javier Perez
Author by

Javier Perez

Updated on September 16, 2022

Comments

  • Javier Perez
    Javier Perez almost 2 years

    I'm trying to evaluate an ANN. I get the accuracies if I use n_jobs = 1, however, when I use n_jobs = - 1 I get the following error. BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

    I have tried using other numbers but it only works if I use n_jobs = 1

    This is the code I am running: accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

    This is the error I am getting:

     Traceback (most recent call last):
     File "<ipython-input-12-cc51c2d2980a>", line 1, in <module>
     accuracies = cross_val_score(estimator = classifier, X = X_train, 
     y = y_train, cv = 10, n_jobs = -1)
    
     File "C:\Users\javie\Anaconda3\lib\site- 
     packages\sklearn\model_selection\_validation.py", line 402, in 
     cross_val_score
     error_score=error_score)
    
     File "C:\Users\javie\Anaconda3\lib\site- 
     packages\sklearn\model_selection\_validation.py", line 240, in 
     cross_validate
     for train, test in cv.split(X, y, groups))
    
     File "C:\Users\javie\Anaconda3\lib\site- 
     packages\sklearn\externals\joblib\parallel.py", line 930, in __call__
     self.retrieve()
    
     File "C:\Users\javie\Anaconda3\lib\site- 
     packages\sklearn\externals\joblib\parallel.py", line 833, in retrieve
     self._output.extend(job.get(timeout=self.timeout))
    
     File "C:\Users\javie\Anaconda3\lib\site- 
     packages\sklearn\externals\joblib\_parallel_backends.py", line 521, in 
     wrap_future_result
     return future.result(timeout=timeout)
    
     File "C:\Users\javie\Anaconda3\lib\concurrent\futures\_base.py", line 
     432, in result
     return self.__get_result()
    
     File "C:\Users\javie\Anaconda3\lib\concurrent\futures\_base.py", line 
     384, in __get_result
     raise self._exception
    
     BrokenProcessPool: A task has failed to un-serialize. Please ensure that 
     the arguments of the function are all picklable.`
    

    Spyder should have analyzed each batch in parallel, but even when I use n_jobs = 1 it only analyzes 10 epochs.

  • ongenz
    ongenz over 4 years
    I was having the same problem and moving from Spyder to command line fixed this error (although I now have others to deal with!).
  • David Daverio
    David Daverio about 3 years
    Hmm I am running in ipython, I had this issue... But I got that one only if i import joblib in ipython. If i dont and only do it in the module where my functions are difined it works fine....
  • Metrician
    Metrician almost 3 years
    Life saver ! (Y)