Modifying a Python dictionary from different threads

15,450

Does the same apply to dictionaries? Or is a dictionary a collection of variables?

Let's be more general:

What does "atomic operation" mean?

From Wikipedia :

In concurrent programming, an operation (or set of operations) is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes.

Now what does this mean in Python?

This means that each bytecode instruction is atomic (at least for Python <3.2, before the new GIL).

Why is that???

Because Python (CPython) use a Global Interpreter Lock (GIL). The CPython interpreter uses a lock to make sure that only one thread runs in the interpreter at a time, and uses a "check interval" (see sys.getcheckinterval()) to know how many bytecode instructions to execute before switching between threads (by default set to 100).

So now what does this mean??

It means that operations that can be represented by only one bytecode instruction are atomic. For example, incrementing a variable is not atomic, because the operation is done in three bytecode instructions:

>>> import dis

>>> def f(a):
        a += 1

>>> dis.dis(f)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               1 (1)      <<<<<<<<<<<< Operation 1 Load
              6 INPLACE_ADD                         <<<<<<<<<<<< Operation 2 iadd
              7 STORE_FAST               0 (a)      <<<<<<<<<<<< Operation 3 store
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

So what about dictionaries??

Some operations are atomic; for example, this operation is atomic:

d[x] = y
d.update(d2)
d.keys()

See for yourself:

>>> def f(d):
        x = 1
        y = 1
        d[x] = y

>>> dis.dis(f)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               1 (x)

  3           6 LOAD_CONST               1 (1)
              9 STORE_FAST               2 (y)

  4          12 LOAD_FAST                2 (y)
             15 LOAD_FAST                0 (d)
             18 LOAD_FAST                1 (x)
             21 STORE_SUBSCR                      <<<<<<<<<<< One operation 
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE   

See this to understand what STORE_SUBSCR does.

But as you see, it is not totally true, because this operation:

             ...
  4          12 LOAD_FAST                2 (y)
             15 LOAD_FAST                0 (d)
             18 LOAD_FAST                1 (x)
             ...

can make the entire operation not atomic. Why? Let's say the variable x can also be changed by another thread...or that you want another thread to clear your dictionary...we can name many cases when it can go wrong, so it is complicated! And so here we will apply Murphy's Law: "Anything that can go wrong, will go wrong".

So what now?

If you still want to share variables between thread, use a lock:

import threading

mylock = threading.RLock()

def atomic_operation():
    with mylock:
        print "operation are now atomic"
Share:
15,450
Jelle De Loecker
Author by

Jelle De Loecker

Updated on June 06, 2022

Comments

  • Jelle De Loecker
    Jelle De Loecker almost 2 years

    When it comes to threading, I know you have to make sure you're not editing a variable at the same time another thread is editing it, as your changes can be lost (when incrementing a counter, for example)

    Does the same apply to dictionaries? Or is a dictionary a collection of variables?

    If every thread were to lock the dictionary it would slow the program down significantly, while every thread only needs write access to its own little piece of the dictionary.

    If it isn't possible, is there some sort of variable variable in python, like in php?

  • HeyWatchThis
    HeyWatchThis over 10 years
    Thanks for being thorough. Have you heard if anyone has implemented an eventual consistency layer on top of threading? So use cases that can tolerate stale data can get better performance, that is.
  • mouad
    mouad over 10 years
    @HeyWatchThis: Hi, not that i am aware of, sorry :(, but probably now the big new think is green thread (back from the past :)) see library like gevent, eventlet ... that may simplify threading for you specially if you have a I/O bound process, else you still have the multiprocessing option which is advocate by most dynamic language that have GIL (like Python, Ruby) or maybe new shinning think like transactional memory from pypy, experimental only :) Else if you want performant concurrent code look to Erlang or such ... Cheers,