How to make built-in containers (sets, dicts, lists) thread safe?

10,764

Solution 1

You can use Python's metaprogramming facilities to accomplish this. (Note: written quickly and not thoroughly tested.) I prefer to use a class decorator.

I also think you may need to lock more than add and remove to make a set thread-safe, but I'm not sure. I'll ignore that problem and just concentrate on your question.

Also consider whether delegation (proxying) is a better fit than subclassing. Wrapping objects is the usual approach in Python.

Finally, there is no "magic wand" of metaprogramming that will magically add fine-grained locking to any mutable Python collection. The safest thing to do is to lock any method or attribute access using RLock, but this is very coarse-grained and slow and probably still not a guarantee that your object will be thread-safe in all cases. (For example, you may have a collection that manipulates another non-threadsafe object accessible to other threads.) You really do need to examine each and every data structure and think about what operations are atomic or require locks and which methods might call other methods using the same lock (i.e., deadlock itself).

That said, here are some techniques at your disposal in increasing order of abstraction:

Delegation

class LockProxy(object):
    def __init__(self, obj):
        self.__obj = obj
        self.__lock = RLock()
        # RLock because object methods may call own methods
    def __getattr__(self, name):
        def wrapped(*a, **k):
            with self.__lock:
                getattr(self.__obj, name)(*a, **k)
        return wrapped

lockedset = LockProxy(set([1,2,3]))

Context manager

class LockedSet(set):
    """A set where add(), remove(), and 'in' operator are thread-safe"""

    def __init__(self, *args, **kwargs):
        self._lock = Lock()
        super(LockedSet, self).__init__(*args, **kwargs)

    def add(self, elem):
        with self._lock:
            super(LockedSet, self).add(elem)

    def remove(self, elem):
        with self._lock:
            super(LockedSet, self).remove(elem)

    def __contains__(self, elem):
        with self._lock:
            super(LockedSet, self).__contains__(elem)

Decorator

def locked_method(method):
    """Method decorator. Requires a lock object at self._lock"""
    def newmethod(self, *args, **kwargs):
        with self._lock:
            return method(self, *args, **kwargs)
    return newmethod

class DecoratorLockedSet(set):
    def __init__(self, *args, **kwargs):
        self._lock = Lock()
        super(DecoratorLockedSet, self).__init__(*args, **kwargs)

    @locked_method
    def add(self, *args, **kwargs):
        return super(DecoratorLockedSet, self).add(elem)

    @locked_method
    def remove(self, *args, **kwargs):
        return super(DecoratorLockedSet, self).remove(elem)

Class Decorator

I think this is the cleanest and easiest-to-understand of the abstract methods, so I've expanded it to allow one to specify the methods to lock and a lock object factory.

def lock_class(methodnames, lockfactory):
    return lambda cls: make_threadsafe(cls, methodnames, lockfactory)

def lock_method(method):
    if getattr(method, '__is_locked', False):
        raise TypeError("Method %r is already locked!" % method)
    def locked_method(self, *arg, **kwarg):
        with self._lock:
            return method(self, *arg, **kwarg)
    locked_method.__name__ = '%s(%s)' % ('lock_method', method.__name__)
    locked_method.__is_locked = True
    return locked_method


def make_threadsafe(cls, methodnames, lockfactory):
    init = cls.__init__
    def newinit(self, *arg, **kwarg):
        init(self, *arg, **kwarg)
        self._lock = lockfactory()
    cls.__init__ = newinit

    for methodname in methodnames:
        oldmethod = getattr(cls, methodname)
        newmethod = lock_method(oldmethod)
        setattr(cls, methodname, newmethod)

    return cls


@lock_class(['add','remove'], Lock)
class ClassDecoratorLockedSet(set):
    @lock_method # if you double-lock a method, a TypeError is raised
    def frobnify(self):
        pass

Override Attribute access with __getattribute__

class AttrLockedSet(set):
    def __init__(self, *args, **kwargs):
        self._lock = Lock()
        super(AttrLockedSet, self).__init__(*args, **kwargs)

    def __getattribute__(self, name):
        if name in ['add','remove']:
            # note: makes a new callable object "lockedmethod" on every call
            # best to add a layer of memoization
            lock = self._lock
            def lockedmethod(*args, **kwargs):
                with lock:
                    return super(AttrLockedSet, self).__getattribute__(name)(*args, **kwargs)
            return lockedmethod
        else:
            return super(AttrLockedSet, self).__getattribute__(name)

Dynamically-added wrapper methods with __new__

class NewLockedSet(set):
    def __new__(cls, *args, **kwargs):
        # modify the class by adding new unbound methods
        # you could also attach a single __getattribute__ like above
        for membername in ['add', 'remove']:
            def scoper(membername=membername):
                # You can also return the function or use a class
                def lockedmethod(self, *args, **kwargs):
                    with self._lock:
                        m = getattr(super(NewLockedSet, self), membername)
                        return m(*args, **kwargs)
                lockedmethod.__name__ = membername
                setattr(cls, membername, lockedmethod)
        self = super(NewLockedSet, cls).__new__(cls, *args, **kwargs)
        self._lock = Lock()
        return self

Dynamically-added wrapper methods with __metaclass__

def _lockname(classname):
    return '_%s__%s' % (classname, 'lock')

class LockedClass(type):
    def __new__(mcls, name, bases, dict_):
        # we'll bind these after we add the methods
        cls = None
        def lockmethodfactory(methodname, lockattr):
            def lockedmethod(self, *args, **kwargs):
                with getattr(self, lockattr):
                    m = getattr(super(cls, self), methodname)
                    return m(*args,**kwargs)
            lockedmethod.__name__ = methodname
            return lockedmethod
        lockattr = _lockname(name)
        for methodname in ['add','remove']:
            dict_[methodname] = lockmethodfactory(methodname, lockattr)
        cls = type.__new__(mcls, name, bases, dict_)
        return cls

    def __call__(self, *args, **kwargs):
        #self is a class--i.e. an "instance" of the LockedClass type
        instance = super(LockedClass, self).__call__(*args, **kwargs)
        setattr(instance, _lockname(self.__name__), Lock())
        return instance



class MetaLockedSet(set):
    __metaclass__ = LockedClass

Dynamically-created Metaclasses

def LockedClassMetaFactory(wrapmethods):
    class LockedClass(type):
        def __new__(mcls, name, bases, dict_):
            # we'll bind these after we add the methods
            cls = None
            def lockmethodfactory(methodname, lockattr):
                def lockedmethod(self, *args, **kwargs):
                    with getattr(self, lockattr):
                        m = getattr(super(cls, self), methodname)
                        return m(*args,**kwargs)
                lockedmethod.__name__ = methodname
                return lockedmethod
            lockattr = _lockname(name)
            for methodname in wrapmethods:
                dict_[methodname] = lockmethodfactory(methodname, lockattr)
            cls = type.__new__(mcls, name, bases, dict_)
            return cls

        def __call__(self, *args, **kwargs):
            #self is a class--i.e. an "instance" of the LockedClass type
            instance = super(LockedClass, self).__call__(*args, **kwargs)
            setattr(instance, _lockname(self.__name__), Lock())
            return instance
    return LockedClass

class MetaFactoryLockedSet(set):
    __metaclass__ = LockedClassMetaFactory(['add','remove'])

I'll bet using a simple, explicit try...finally doesn't look so bad now, right?

Exercise for the reader: let the caller pass in their own Lock() object (dependency injection) using any of these methods.

Solution 2

This is my first attempt to play with decorators (although my code doesn't actually use the @decorate syntax), and I don't have much experience with multi-threading/multiprocessing. With that disclaimer, though, here's an attempt I made:

from multiprocessing import Lock

def decorate_all(obj):
    lock = Lock()
    #you'll want to make this more robust:
    fnc_names = [fnctn for fnctn in dir(obj) if '__' not in fnctn]
    for name in fnc_names:
        print 'decorating ' + name
        fnc = getattr(obj, name)
        setattr(obj, name, decorate(fnc, lock))
    return obj

def decorate(fnctn, lock):
    def decorated(*args):
        print 'acquiring lock'
        lock.acquire()
        try:
            print 'calling decorated function'
            return fnctn(*args)
        finally:
            print 'releasing lock'
            lock.release()
    return decorated


def thread_safe(superclass):
    lock = Lock()
    class Thread_Safe(superclass):
        def __init__(self, *args, **kwargs):
            super(Thread_Safe, self).__init__(*args, **kwargs)
    return decorate_all(Thread_Safe)


>>> thread_safe_set = thread_safe(set)
decorating add
decorating clear
decorating copy
decorating difference
decorating difference_update
decorating discard
decorating intersection
decorating intersection_update
decorating isdisjoint
decorating issubset
decorating issuperset
decorating pop
decorating remove
decorating symmetric_difference
decorating symmetric_difference_update
decorating union
decorating update
>>> s = thread_safe_set()
>>> s.add(1)
acquiring lock
calling decorated function
releasing lock
>>> s.add(4)
acquiring lock
calling decorated function
releasing lock
>>> s.pop()
acquiring lock
calling decorated function
releasing lock
1
>>> s.pop()
acquiring lock
calling decorated function
releasing lock
4
>>>

Solution 3

[Indeed, see the comments, it is not true]

If you are running CPython you can see from the set source code that it doesn't release the GIL (http://hg.python.org/cpython/file/db20367b20de/Objects/setobject.c) so all its operations should be atomic.

If it is all what you need and you are sure to run your code on CPython you can just use it directly.

Share:
10,764

Related videos on Youtube

E.Z.
Author by

E.Z.

.

Updated on June 14, 2022

Comments

  • E.Z.
    E.Z. almost 2 years

    I understand from this question that if I want to have a set which is thread-safe I have to implement the thread-safety part on my own.

    Therefore I could come up with:

    from threading import Lock
    
    class LockedSet(set):
        """A set where add() and remove() are thread-safe"""
    
        def __init__(self, *args, **kwargs):
            # Create a lock
            self._lock = Lock()
            # Call the original __init__
            super(LockedSet, self).__init__(*args, **kwargs)
    
        def add(self, elem):
            self._lock.acquire()
            try:
                super(LockedSet, self).add(elem)
            finally:
                self._lock.release()
    
        def remove(self, elem):
            self._lock.acquire()
            try:
                super(LockedSet, self).remove(elem)
            finally:
                self._lock.release()
    

    So, of course only add() and remove() are thread-safe in this implementation. The other methods are not because they were not overwritten in the subclass.

    Now, the pattern is pretty simple: acquire lock, call original method, release lock. If I follow the logic above, I would have to overwrite all methods exposed by set in essentially the same way, e.g.:

    (pseudo-code)

    def <method>(<args>):
        1. acquire lock
        2. try:
        3.     call original method passing <args>
        4. finally:
        5.     release lock
    

    (/pseudo-code)

    This is not only tedious but also prone to errors. So, any ideas/suggestions on how to approach this in a better way?

    • Janne Karila
      Janne Karila over 11 years
      Some methods of containers access other containers. You can easily run into a deadlock if you have a lock in each.
    • Niklas R
      Niklas R over 11 years
      Di you know the Lock class implements the with interface?
    • E.Z.
      E.Z. over 11 years
      I was stuck with Python 2.4 until a short time ago (no context managers in there), so I didn't get used to it yet. Indeed, the above code would look cleaner by using with instead of try...finally
    • E.Z.
      E.Z. over 11 years
      Thanks for pointing that out @Janne, I didn't think about that. Perhaps the only safe way then is analysing each method and implementing it individually.
    • Brenden Brown
      Brenden Brown over 11 years
      Francis Avila's answer uses RLock, and I think that resolves Janne Karila's objection. stackoverflow.com/questions/1822541/…
  • Admin
    Admin over 11 years
    Useful, but I think this question is more about fine-grained locking for individual operations.
  • unddoch
    unddoch over 11 years
    Still three statements less than try/finally. You can put it in the code for methods, too. (Or implement a @lock decorator).
  • Admin
    Admin over 11 years
    Yeah, then put that in your question. But for internal use, it would be better to just use with self._lock: instead of exposing a context manager.
  • E.Z.
    E.Z. over 11 years
    In this case the methods would always have to be called within a with statement, is that right? I'd find it a bit cumbersome; I'd rather go for for an implementation where the methods are called just as the original ones.
  • E.Z.
    E.Z. over 11 years
    :) What exactly am I looking for in there? (I'm not really familiar with the cpython source)
  • Janne Karila
    Janne Karila over 11 years
    While set itself is implemented in C, it may call Python functions, eg. the __hash__ method of elements. Python interpreter may release GIL during those calls.
  • E.Z.
    E.Z. over 11 years
    wow, that's a complete answer :) Thanks a lot! +1, Also, thanks for pointing RLock out.
  • martineau
    martineau over 11 years
    Very nice and comprehensive answer. Curious as to why do you prefer the class decorator. Personally I like the Delegator -- something I was trying to get working but got side-tracked because I was unaware of RLock. One reason I like delegation more is because it can be applied to an existing object (unlike a class decorator).
  • Francis Avila
    Francis Avila over 11 years
    Proxying/delegation is a solid, straightforward approach that obeys the principle of least power. The only downside is that it does not preserve the type of the object: a check isinstance(myDelegatedThreadsafeSet, set) will fail even though the delegate preserves the same interface. The metaprogramming approaches would still work because they produce objects which are still subclasses of set. Idiomatic Python code will not usually rely on explicit type checks (it will use "duck-typing" and document interfaces), but such code exists and a class decorator will still work with it.
  • martineau
    martineau over 11 years
    Ah, yes, all good points. Thanks for explaining. It just dawned on me you didn't include a "Mix-In" (multiple base-class) solution...so your answer isn't quite complete. ;-) BTW, I didn't get notified of your reply and read it just until now because you didn't put @martineau in it.
  • martineau
    martineau over 11 years
    One thing I like about delegation is that there's essentially two interfaces to the same object, one non-locking and faster than corresponding locking version. This which might allow portions of the code to be optimized.
  • martineau
    martineau about 11 years
    +1 for not deleting your answer (thereby possibly preventing others from drawing the same mistaken conclusion).
  • martineau
    martineau over 7 years
    The LockableSet class makes no sense (or is, at best, incomplete). Where are its lock() and unlock() methods defined?
  • Wood
    Wood almost 7 years
    With the Context Manager code, if I run s = LockedSet({'foo'}) and for k in s: print(k in s), it prints False. It prints True if I delete the last 3 lines (the __contains__ method).
  • martineau
    martineau about 5 years
    @Wood: Yes, it looks like you found a bug—which could probably be fixed by changing the last line to return super(LockedSet, self).__contains__(elem).
  • martineau
    martineau over 3 years
    I think it should be return getattr(self.__obj, name)(*a, **k) at the end of the wrapped() nested function shown in the Delegation example.