Shared variable in python's multiprocessing
When you use Value
you get a ctypes
object in shared memory that by default is synchronized using RLock
. When you use Manager
you get a SynManager
object that controls a server process which allows object values to be manipulated by other processes. You can create multiple proxies using the same manager; there is no need to create a new manager in your loop:
manager = Manager()
for i in range(5):
new_value = manager.Value('i', 0)
The Manager
can be shared across computers, while Value
is limited to one computer. Value
will be faster (run the below code to see), so I think you should use that unless you need to support arbitrary objects or access them over a network.
import time
from multiprocessing import Process, Manager, Value
def foo(data, name=''):
print(type(data), data.value, name)
data.value += 1
if __name__ == "__main__":
manager = Manager()
x = manager.Value('i', 0)
y = Value('i', 0)
for i in range(5):
Process(target=foo, args=(x, 'x')).start()
Process(target=foo, args=(y, 'y')).start()
print('Before waiting: ')
print('x = {0}'.format(x.value))
print('y = {0}'.format(y.value))
time.sleep(5.0)
print('After waiting: ')
print('x = {0}'.format(x.value))
print('y = {0}'.format(y.value))
To summarize:
- Use
Manager
to create multiple shared objects, including dicts and lists. UseManager
to share data across computers on a network. - Use
Value
orArray
when it is not necessary to share information across a network and the types inctypes
are sufficient for your needs. -
Value
is faster thanManager
.
Warning
By the way, sharing data across processes/threads should be avoided if possible. The code above will probably run as expected, but increase the time it takes to execute foo
and things will get weird. Compare the above with:
def foo(data, name=''):
print type(data), data.value, name
for j in range(1000):
data.value += 1
You'll need a Lock
to make this work correctly.
I am not especially knowledgable about all of this, so maybe someone else will come along and offer more insight. I figured I would contribute an answer since the question was not getting attention. Hope that helps a little.
Related videos on Youtube
user2435611
Updated on July 09, 2022Comments
-
user2435611 almost 2 years
First question is what is the difference between Value and Manager().Value?
Second, is it possible to share integer variable without using Value? Below is my sample code. What I want is getting a dict with a value of integer, not Value. What I did is just change it all after the process. Is there any easier way?
from multiprocessing import Process, Manager def f(n): n.value += 1 if __name__ == '__main__': d = {} p = [] for i in range(5): d[i] = Manager().Value('i',0) p.append(Process(target=f, args=(d[i],))) p[i].start() for q in p: q.join() for i in d: d[i] = d[i].value print d
-
Chris_Rands about 7 yearsRelevant: eli.thegreenplace.net/2012/01/04/…
-
-
user2435611 almost 11 yearscan we add any value to Array? I can't append any value to Array.
-
ChrisP almost 11 years@user2435611,
Array
will give you a shared ctypes array. You need to decide what type of data you are storing beforehand, and supply a type code. For example,a = Array('c', 10)
creates an array of one-character strings of length 10. New entries can be added to the array like so:a[0] = 'b'
. You cannot add any value to an array, see the list of type codes. -
user2435611 almost 11 yearsSo we should decide the size of array beforehand and can't expand it? if so, it's better for me to use manager.list(). Thanks for help :)
-
ChrisP almost 11 years@user2435611: Yes, I think that's right. The
multiprocessing.Array
is allocated memory at the time of creation and unlikearray.array
cannot be expanded. Usemanager.list
if you really have no idea how much space you need, but you might want to experiment with allocating anArray
with some extra space if you can find an upper-bound on the size. I hope that helps. -
Travis Leleu over 9 years@ChrisP i'm late to the party, but how would you recommend sharing a simple int variable across processes on one machine? I use multiprocessing for IO bound workers (until I learn async), and would like to have a counter they share so I know how many iterations they've gone through. Recs on how best to implement?
-
Connor almost 6 years@TravisLeleu did you find a solution for this?
-
RS1 over 5 yearsHow about sharing data by adding it to other module and accessing that module from all processes something explained in this? This does not at all use any features of multiprocessing module, but plain python. Is this ok, if I dont want to have synchronized access?
-
RS1 over 5 yearsCan you just comment if we can share any datatype such as pandas dataframe using this approach?
-
soulmachine over 5 years
-
Jirka about 5 yearsit seems that it does not run fo py3
-
pdaawr almost 4 yearsIt doesn't work for python3.7:
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'