Getting TypeError: can't pickle _thread.lock objects
The documentation says that you can't copy a client from a main process to a child process, you have to create the connection after you fork. The client object cannot be copied, create connections, after you fork the process.
On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient.
http://api.mongodb.com/python/current/faq.html#id21
Comments
-
Bharath Bharath over 1 year
I am querying MongoDB to get list of dictionary and for each dict in the list, I am doing some comparison of values. Based on the result of comparison, I am storing the values of dictionary, comparison result, and other values calculated in a mongoDB collection. I am trying to do this by invoking multiprocessing and am getting this error.
def save_for_doc(doc_id): #function to get the fields of doc fields = get_fields(doc_id) no_of_process = 5 doc_col_size = 30000 chunk_size = round(doc_col_size/no_of_process) chunk_ranges = range(0, no_of_process*chunk_size, chunk_size) processes = [ multiprocessing.Process(target=save_similar_docs, args= (doc_id,client,fields,chunks,chunk_size)) for chunks in chunk_ranges] for prc in processes: prc.start() def save_similar_docs(arguments): #This function process the args and saves the results to MongoDB. Does not #return anything as the end result is directly stored.
Below is the error:
File "H:/Desktop/Performance Improvement/With_Process_Pool.py", line 144, in save_for_doc prc.start() File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__ reduction.dump(process_obj, to_child) File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread.lock objects
What does this error mean? Please explain and how can I get over.
-
tdelaney almost 6 yearsPython serializes (pickles) the arguments for the process, sends them and deserializes (unpickles) them in the subprocess. But not all objects can be serialized, hence the error. One of the arguments in
doc_id,client,fields,chunks,chunk_size
has a lock object. In windows, more process state is pickled and the problem could be there. Trypickle.dumps()
on your parameters to see if one fails. -
Bharath Bharath almost 6 yearsYes. If I do separately the pickle.dumps on args, there itself it fails. But again it says in general and not about specif arg that fails. How can I identify it and solve it? Thanks for your information.
-
tdelaney almost 6 yearsSo you know which one is the problem. Is it
client
? If that's something like a login to a database, it may not be sharable. The general solution is to convert it to basic types that can be shared and rebuild the original object on the other side. Or rewrite so that you don't need this particular type of object. Without knowing more about your code, I can only guess. -
Bharath Bharath almost 6 yearsGreat. I guess I found the error. One of the arg is a cursor object and that is the problem I believe. I looped through the cursor and passed the data of cursor in a list, now the error is gone. But I end up with 'Broken Pipe' error. Thanks for all your help.
-