Very simple concurrent programming in Python
Solution 1
import multiprocessing
import Foo
import Bar
results = {}
def get_a():
results['a'] = Foo.get_something()
def get_b():
results['b'] = Bar.get_something_else()
process_a = multiprocessing.Process(target=get_a)
process_b = multiprocessing.Process(target=get_b)
process_b.start()
process_a.start()
process_a.join
process_b.join
Here is the process version of your program.
NOTE: that in threading there are shared datastructures so you have to worry about locking which avoids wrong manipulation of data plus as amber mentioned above it also has a GIL (Global interpreter Lock) problem and since both of your tasks are CPU intensive then this means that it will take more time because of the calls notifying the threads of thread acquisition and release. If however your tasks were I/O intensive then it does not effect that much.
Now since there are no shared datastructures in a process thus no worrying about LOCKS and since it works irrespective of the GIL so you actually enjoy the real power of multiprocessors.
Simple note to remember: process is the same as thread just without using a shared datastructures (everything works in isolation and is focused on messaging.)
check out dabeaz.com he gave a good presentation on concurrent programming once.
Solution 2
In general, you'd use threading
to do this.
First, create a thread for each thing you want to run in parallel:
import threading
import Foo
import Bar
results = {}
def get_a():
results['a'] = Foo.get_something()
a_thread = threading.Thread(target=get_a)
a_thread.start()
def get_b():
results['b'] = Bar.get_something_else()
b_thread = threading.Thread(target=get_b)
b_thread.start()
Then to require both of them to have finished, use .join()
on both:
a_thread.join()
b_thread.join()
at which point your results will be in results['a']
and results['b']
, so if you wanted an ordered list:
output = [results['a'], results['b']]
Note: if both tasks are inherently CPU-intensive, you might want to consider multiprocessing
instead - due to Python's GIL, a given Python process will only ever use one CPU core, whereas multiprocessing
can distribute the tasks to separate cores. However, it has a slightly higher overhead than threading
, and thus if the tasks are less CPU-intensive, it might not be as efficient.
Comments
-
Ivy almost 2 years
I have a simple Python script that uses two much more complicated Python scripts, and does something with the results.
I have two modules, Foo and Bar, and my code is like the following:
import Foo import Bar output = [] a = Foo.get_something() b = Bar.get_something_else() output.append(a) output.append(b)
Both methods take a long time to run, and neither depends on the other, so the obvious solution is to run them in parallel. How can I achieve this, but make sure that the order is maintained: Whichever one finishes first must wait for the other one to finish before the script can continue.
Let me know if I haven't made myself clear enough, I've tried to make the example code as simple as possible.
-
akaRem about 12 yearsThreading is helpful for code with idling. If it has a lot of heavy calculation, you should advise multiprocessing, because threading will only add lags in this case.
-
Ivy about 12 yearsYes it's very CPU intensive. I'll get this working first however; I had a quick look and the APIs are identical.
-
Ivy about 12 yearsI've been trying for a while now, but I'm constantly getting the error:
AttributeError: 'NoneType' object has no attribute 'join'
That's referring to a_thread every time. b_thread works as expected. -
Amber about 12 years@Blazemore I've edited the code slightly, there was a slight issue with it.
-
Ivy about 12 yearsDo you know why I might have issues with accessing the results dictionary? I get key errors. Getting the functions to write to any global variable is also giving me issues; it's just not writing them.