Getting TypeError: can't pickle _thread.lock objects

11,338

The documentation says that you can't copy a client from a main process to a child process, you have to create the connection after you fork. The client object cannot be copied, create connections, after you fork the process.

On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient.

http://api.mongodb.com/python/current/faq.html#id21

Share:
11,338
Bharath Bharath
Author by

Bharath Bharath

All day Python fun..

Updated on July 29, 2022

Comments

  • Bharath Bharath
    Bharath Bharath over 1 year

    I am querying MongoDB to get list of dictionary and for each dict in the list, I am doing some comparison of values. Based on the result of comparison, I am storing the values of dictionary, comparison result, and other values calculated in a mongoDB collection. I am trying to do this by invoking multiprocessing and am getting this error.

    def save_for_doc(doc_id):
    
        #function to get the fields of doc
        fields = get_fields(doc_id)
        no_of_process = 5
        doc_col_size = 30000
        chunk_size = round(doc_col_size/no_of_process)
        chunk_ranges = range(0, no_of_process*chunk_size, chunk_size)
        processes = [ multiprocessing.Process(target=save_similar_docs, args= 
        (doc_id,client,fields,chunks,chunk_size)) for chunks in chunk_ranges]
        for prc in processes:
           prc.start()
    
    def save_similar_docs(arguments):
    
         #This function process the args and saves the results to MongoDB. Does not 
         #return anything as the end result is directly stored.
    

    Below is the error:

     File "H:/Desktop/Performance Improvement/With_Process_Pool.py", line 144, 
     in save_for_doc
       prc.start()
    
     File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, 
     in start
      self._popen = self._Popen(self)
    
     File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, 
     in _Popen
       return _default_context.get_context().Process._Popen(process_obj)
    
     File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, 
     in _Popen
       return Popen(process_obj)
    
     File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", 
     line 65, in __init__
    reduction.dump(process_obj, to_child)
    
     File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, 
     in dump
            ForkingPickler(file, protocol).dump(obj)
    
            TypeError: can't pickle _thread.lock objects
    

    What does this error mean? Please explain and how can I get over.

    • tdelaney
      tdelaney almost 6 years
      Python serializes (pickles) the arguments for the process, sends them and deserializes (unpickles) them in the subprocess. But not all objects can be serialized, hence the error. One of the arguments in doc_id,client,fields,chunks,chunk_size has a lock object. In windows, more process state is pickled and the problem could be there. Try pickle.dumps() on your parameters to see if one fails.
    • Bharath Bharath
      Bharath Bharath almost 6 years
      Yes. If I do separately the pickle.dumps on args, there itself it fails. But again it says in general and not about specif arg that fails. How can I identify it and solve it? Thanks for your information.
    • tdelaney
      tdelaney almost 6 years
      So you know which one is the problem. Is it client? If that's something like a login to a database, it may not be sharable. The general solution is to convert it to basic types that can be shared and rebuild the original object on the other side. Or rewrite so that you don't need this particular type of object. Without knowing more about your code, I can only guess.
    • Bharath Bharath
      Bharath Bharath almost 6 years
      Great. I guess I found the error. One of the arg is a cursor object and that is the problem I believe. I looped through the cursor and passed the data of cursor in a list, now the error is gone. But I end up with 'Broken Pipe' error. Thanks for all your help.