Are locks unnecessary in multi-threaded Python code because of the GIL?

10,360

Solution 1

You will still need locks if you share state between threads. The GIL only protects the interpreter internally. You can still have inconsistent updates in your own code.

For example:

#!/usr/bin/env python
import threading

shared_balance = 0

class Deposit(threading.Thread):
    def run(self):
        for _ in xrange(1000000):
            global shared_balance
            balance = shared_balance
            balance += 100
            shared_balance = balance

class Withdraw(threading.Thread):
    def run(self):
        for _ in xrange(1000000):
            global shared_balance
            balance = shared_balance
            balance -= 100
            shared_balance = balance

threads = [Deposit(), Withdraw()]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print shared_balance

Here, your code can be interrupted between reading the shared state (balance = shared_balance) and writing the changed result back (shared_balance = balance), causing a lost update. The result is a random value for the shared state.

To make the updates consistent, run methods would need to lock the shared state around the read-modify-write sections (inside the loops) or have some way to detect when the shared state had changed since it was read.

Solution 2

No - the GIL just protects python internals from multiple threads altering their state. This is a very low-level of locking, sufficient only to keep python's own structures in a consistent state. It doesn't cover the application level locking you'll need to do to cover thread safety in your own code.

The essence of locking is to ensure that a particular block of code is only executed by one thread. The GIL enforces this for blocks the size of a single bytecode, but usually you want the lock to span a larger block of code than this.

Solution 3

Adding to the discussion:

Because the GIL exists, some operations are atomic in Python and do not need a lock.

http://www.python.org/doc/faq/library/#what-kinds-of-global-value-mutation-are-thread-safe

As stated by the other answers, however, you still need to use locks whenever the application logic requires them (such as in a Producer/Consumer problem).

Solution 4

This post describes the GIL at a fairly high-level:

Of particular interest are these quotes:

Every ten instructions (this default can be changed), the core releases the GIL for the current thread. At that point, the OS chooses a thread from all the threads competing for the lock (possibly choosing the same thread that just released the GIL – you don't have any control over which thread gets chosen); that thread acquires the GIL and then runs for another ten bytecodes.

and

Note carefully that the GIL only restricts pure Python code. Extensions (external Python libraries usually written in C) can be written that release the lock, which then allows the Python interpreter to run separately from the extension until the extension reacquires the lock.

It sounds like the GIL just provides fewer possible instances for a context switch, and makes multi-core/processor systems behave as a single core, with respect to each python interpreter instance, so yes, you still need to use synchronization mechanisms.

Solution 5

The Global Interpreter Lock prevents threads from accessing the interpreter simultaneously (thus CPython only ever uses one core). However, as I understand it, the threads are still interrupted and scheduled preemptively, which means you still need locks on shared data structures, lest your threads stomp on each other's toes.

The answer I've encountered time and time again is that multithreading in Python is rarely worth the overhead, because of this. I've heard good things about the PyProcessing project, which makes running multiple processes as "simple" as multithreading, with shared data structures, queues, etc. (PyProcessing will be introduced into the standard library of the upcoming Python 2.6 as the multiprocessing module.) This gets you around the GIL, as each process has its own interpreter.

Share:
10,360

Related videos on Youtube

Corey Goldberg
Author by

Corey Goldberg

"Outside of a dog, a book is a man's best friend. Inside of a dog, it's too dark to read."

Updated on November 14, 2020

Comments

  • Corey Goldberg
    Corey Goldberg over 3 years

    If you are relying on an implementation of Python that has a Global Interpreter Lock (i.e. CPython) and writing multithreaded code, do you really need locks at all?

    If the GIL doesn't allow multiple instructions to be executed in parallel, wouldn't shared data be unnecessary to protect?

    sorry if this is a dumb question, but it is something I have always wondered about Python on multi-processor/core machines.

    same thing would apply to any other language implementation that has a GIL.

    • L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳
      L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ almost 14 years
      Also note that the GIL is and implementation detail. IronPython and Jython for example, do not have a GIL.
  • jimx
    jimx over 15 years
    Ok looks like GIL does not lock the thread whole time and context switch could still happen. So I am wrong, lock is still needed.
  • Peter Hansen
    Peter Hansen about 14 years
    Note, sys.getcheckinterval() tells you how many bytecode instructions are executed between "GIL releases" (and it's been 100 (not 10) since at least 2.5). In 3.2 it may be switching to a time-based interval (5ms or so) rather than instruction counts. The change may be applied to 2.7 as well though it's still a work in progress.
  • RayLuo
    RayLuo about 11 years
    The code example gives a clear and visual understanding! Nice post Harris! I wish I could up-vote twice!
  • mrgloom
    mrgloom almost 5 years
    Will it be safe if there will be only one line shared_balance += 100 and shared_balance -= 100?