How to use Python multiprocessing Pool.map to fill numpy array in a for loop
Solution 1
The following works. First it is a good idea to protect the main part of your code inside a main block in order to avoid weird side effects. The result of poo.map()
is a list containing the evaluations for each value in the iterator list_start_vals
, such that you don't have to create array_2D
before.
import numpy as np
from multiprocessing import Pool
def fill_array(start_val):
return list(range(start_val, start_val+10))
if __name__=='__main__':
pool = Pool(processes=4)
list_start_vals = range(40, 60)
array_2D = np.array(pool.map(fill_array, list_start_vals))
pool.close() # ATTENTION HERE
print array_2D
perhaps you will have trouble using pool.close()
, from the comments of @hpaulj you can just remove this line in case you have problems...
Solution 2
If you still want to use the array fill, you can use pool.apply_async
instead of pool.map
. Working from Saullo's answer:
import numpy as np
from multiprocessing import Pool
def fill_array(start_val):
return range(start_val, start_val+10)
if __name__=='__main__':
pool = Pool(processes=4)
list_start_vals = range(40, 60)
array_2D = np.zeros((20,10))
for line, val in enumerate(list_start_vals):
result = pool.apply_async(fill_array, [val])
array_2D[line,:] = result.get()
pool.close()
print array_2D
This runs a bit slower than the map
. But it does not produce a runtime error like my test of the map version: Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored
MoTSCHIGGE
Updated on June 13, 2022Comments
-
MoTSCHIGGE about 2 years
I want to fill a 2D-numpy array within a for loop and fasten the calculation by using multiprocessing.
import numpy from multiprocessing import Pool array_2D = numpy.zeros((20,10)) pool = Pool(processes = 4) def fill_array(start_val): return range(start_val,start_val+10) list_start_vals = range(40,60) for line in xrange(20): array_2D[line,:] = pool.map(fill_array,list_start_vals) pool.close() print array_2D
The effect of executing it is that Python runs 4 subprocesses and occupies 4 CPU cores BUT the execution doesn´t finish and the array is not printed. If I try to write the array to the disk, nothing happens.
Can anyone tell me why?
-
pylang over 6 yearsDo you recall how you ran this code? In commandline, jupyter or a script?
-
-
MoTSCHIGGE almost 10 yearsThanks for your reply. Unfortunalely the effect is the same. Python starts subprocesses and occupies the PC but nothing happens. I´m running the code on an Windows 7 machine (dual core CPU with hyperthreading => virtually a quadcore), Python 2.7.5 32bit and I use SpyderLib as programming interface.
-
Ram almost 10 years@MoTSCHIGGE i ran the code i posted in windows environment and it seems to be working , I think you are running the code with out the if "main"==__name__: , if that's the case the code will run indefinitely in windows , please refer to the Stack Overflow link regarding the importance of if condition in windows stackoverflow.com/questions/20222534/…
-
MoTSCHIGGE almost 10 yearsI just tried to run the sample code above including "if name == "main": " but nothing happens. I don´t know whats wrong here..
-
hpaulj almost 10 yearsWith larger arrays, I get an error
Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored
.apply_async
does not give this warning. -
hpaulj almost 10 yearsWithout the
pool.close()
command, I don't get thisError
. -
Saullo G. P. Castro almost 10 years@hpaulj thank you for the feedback... I tried producing an array which is
10000 X 10000
with no problem, changing 60 by 10040 and 10 by 10000... -
hpaulj almost 10 yearsMaybe it's an issue of machine size and speed. Mine's relatively old.
-
hpaulj almost 10 yearsOn further testing it appears that a
pool.join()
is more important if the mapping is too slow.