Shared variable in python's multiprocessing

99,230

When you use Value you get a ctypes object in shared memory that by default is synchronized using RLock. When you use Manager you get a SynManager object that controls a server process which allows object values to be manipulated by other processes. You can create multiple proxies using the same manager; there is no need to create a new manager in your loop:

manager = Manager()
for i in range(5):
    new_value = manager.Value('i', 0)

The Manager can be shared across computers, while Value is limited to one computer. Value will be faster (run the below code to see), so I think you should use that unless you need to support arbitrary objects or access them over a network.

import time
from multiprocessing import Process, Manager, Value


def foo(data, name=''):
    print(type(data), data.value, name)
    data.value += 1


if __name__ == "__main__":
    manager = Manager()
    x = manager.Value('i', 0)
    y = Value('i', 0)

    for i in range(5):
        Process(target=foo, args=(x, 'x')).start()
        Process(target=foo, args=(y, 'y')).start()

    print('Before waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

    time.sleep(5.0)
    print('After waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

To summarize:

  1. Use Manager to create multiple shared objects, including dicts and lists. Use Manager to share data across computers on a network.
  2. Use Value or Array when it is not necessary to share information across a network and the types in ctypes are sufficient for your needs.
  3. Value is faster than Manager.

Warning

By the way, sharing data across processes/threads should be avoided if possible. The code above will probably run as expected, but increase the time it takes to execute foo and things will get weird. Compare the above with:

def foo(data, name=''):
    print type(data), data.value, name
    for j in range(1000):
        data.value += 1

You'll need a Lock to make this work correctly.

I am not especially knowledgable about all of this, so maybe someone else will come along and offer more insight. I figured I would contribute an answer since the question was not getting attention. Hope that helps a little.

Share:
99,230

Related videos on Youtube

user2435611
Author by

user2435611

Updated on July 09, 2022

Comments

  • user2435611
    user2435611 almost 2 years

    First question is what is the difference between Value and Manager().Value?

    Second, is it possible to share integer variable without using Value? Below is my sample code. What I want is getting a dict with a value of integer, not Value. What I did is just change it all after the process. Is there any easier way?

    from multiprocessing import Process, Manager
    
    def f(n):
        n.value += 1
    
    if __name__ == '__main__':
        d = {}
        p = []
    
        for i in range(5):
            d[i] = Manager().Value('i',0)
            p.append(Process(target=f, args=(d[i],)))
            p[i].start()
    
        for q in p:
            q.join()
    
        for i in d:
            d[i] = d[i].value
    
        print d
    
  • user2435611
    user2435611 almost 11 years
    can we add any value to Array? I can't append any value to Array.
  • ChrisP
    ChrisP almost 11 years
    @user2435611, Array will give you a shared ctypes array. You need to decide what type of data you are storing beforehand, and supply a type code. For example, a = Array('c', 10) creates an array of one-character strings of length 10. New entries can be added to the array like so: a[0] = 'b'. You cannot add any value to an array, see the list of type codes.
  • user2435611
    user2435611 almost 11 years
    So we should decide the size of array beforehand and can't expand it? if so, it's better for me to use manager.list(). Thanks for help :)
  • ChrisP
    ChrisP almost 11 years
    @user2435611: Yes, I think that's right. The multiprocessing.Array is allocated memory at the time of creation and unlike array.array cannot be expanded. Use manager.list if you really have no idea how much space you need, but you might want to experiment with allocating an Array with some extra space if you can find an upper-bound on the size. I hope that helps.
  • Travis Leleu
    Travis Leleu over 9 years
    @ChrisP i'm late to the party, but how would you recommend sharing a simple int variable across processes on one machine? I use multiprocessing for IO bound workers (until I learn async), and would like to have a counter they share so I know how many iterations they've gone through. Recs on how best to implement?
  • Connor
    Connor almost 6 years
    @TravisLeleu did you find a solution for this?
  • RS1
    RS1 over 5 years
    How about sharing data by adding it to other module and accessing that module from all processes something explained in this? This does not at all use any features of multiprocessing module, but plain python. Is this ok, if I dont want to have synchronized access?
  • RS1
    RS1 over 5 years
    Can you just comment if we can share any datatype such as pandas dataframe using this approach?
  • soulmachine
    soulmachine over 5 years
    You should use a Lock to protect Value, see eli.thegreenplace.net/2012/01/04/…
  • Jirka
    Jirka about 5 years
    it seems that it does not run fo py3
  • pdaawr
    pdaawr almost 4 years
    It doesn't work for python3.7: AttributeError: 'ForkAwareLocal' object has no attribute 'connection'