Why do pthreads’ condition variable functions require a mutex?

84,515

Solution 1

It's just the way that condition variables are (or were originally) implemented.

The mutex is used to protect the condition variable itself. That's why you need it locked before you do a wait.

The wait will "atomically" unlock the mutex, allowing others access to the condition variable (for signalling). Then when the condition variable is signalled or broadcast to, one or more of the threads on the waiting list will be woken up and the mutex will be magically locked again for that thread.

You typically see the following operation with condition variables, illustrating how they work. The following example is a worker thread which is given work via a signal to a condition variable.

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            do the work.
    unlock mutex.
    clean up.
    exit thread.

The work is done within this loop provided that there is some available when the wait returns. When the thread has been flagged to stop doing work (usually by another thread setting the exit condition then kicking the condition variable to wake this thread up), the loop will exit, the mutex will be unlocked and this thread will exit.

The code above is a single-consumer model as the mutex remains locked while the work is being done. For a multi-consumer variation, you can use, as an example:

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            copy work to thread local storage.
            unlock mutex.
            do the work.
            lock mutex.
    unlock mutex.
    clean up.
    exit thread.

which allows other consumers to receive work while this one is doing work.

The condition variable relieves you of the burden of polling some condition instead allowing another thread to notify you when something needs to happen. Another thread can tell that thread that work is available as follows:

lock mutex.
flag work as available.
signal condition variable.
unlock mutex.

The vast majority of what are often erroneously called spurious wakeups was generally always because multiple threads had been signalled within their pthread_cond_wait call (broadcast), one would return with the mutex, do the work, then re-wait.

Then the second signalled thread could come out when there was no work to be done. So you had to have an extra variable indicating that work should be done (this was inherently mutex-protected with the condvar/mutex pair here - other threads needed to lock the mutex before changing it however).

It was technically possible for a thread to return from a condition wait without being kicked by another process (this is a genuine spurious wakeup) but, in all my many years working on pthreads, both in development/service of the code and as a user of them, I never once received one of these. Maybe that was just because HP had a decent implementation :-)

In any case, the same code that handled the erroneous case also handled genuine spurious wakeups as well since the work-available flag would not be set for those.

Solution 2

A condition variable is quite limited if you could only signal a condition, usually you need to handle some data that's related to to condition that was signalled. Signalling/wakeup have to be done atomically in regards to achieve that without introducing race conditions, or be overly complex

pthreads can also give you , for rather technical reasons, a spurious wakeup . That means you need to check a predicate, so you can be sure the condition actually was signalled - and distinguish that from a spurious wakeup. Checking such a condition in regards to waiting for it need to be guarded - so a condition variable needs a way to atomically wait/wake up while locking/unlocking a mutex guarding that condition.

Consider a simple example where you're notified that some data are produced. Maybe another thread made some data that you want, and set a pointer to that data.

Imagine a producer thread giving some data to another consumer thread through a 'some_data' pointer.

while(1) {
    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    char *data = some_data;
    some_data = NULL;
    handle(data);
}

you'd naturally get a lot of race condition, what if the other thread did some_data = new_data right after you got woken up, but before you did data = some_data

You cannot really create your own mutex to guard this case either .e.g

while(1) {

    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    pthread_mutex_lock(&mutex);
    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

Will not work, there's still a chance of a race condition in between waking up and grabbing the mutex. Placing the mutex before the pthread_cond_wait doesn't help you, as you will now hold the mutex while waiting - i.e. the producer will never be able to grab the mutex. (note, in this case you could create a second condition variable to signal the producer that you're done with some_data - though this will become complex, especially so if you want many producers/consumers.)

Thus you need a way to atomically release/grab the mutex when waiting/waking up from the condition. That's what pthread condition variables does, and here's what you'd do:

while(1) {
    pthread_mutex_lock(&mutex);
    while(some_data == NULL) { // predicate to acccount for spurious wakeups,would also 
                               // make it robust if there were several consumers
       pthread_cond_wait(&cond,&mutex); //atomically lock/unlock mutex
    }

    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

(the producer would naturally need to take the same precautions, always guarding 'some_data' with the same mutex, and making sure it doesn't overwrite some_data if some_data is currently != NULL)

Solution 3

POSIX condition variables are stateless. So it is your responsibility to maintain the state. Since the state will be accessed by both threads that wait and threads that tell other threads to stop waiting, it must be protected by a mutex. If you think you can use condition variables without a mutex, then you haven't grasped that condition variables are stateless.

Condition variables are built around a condition. Threads that wait on a condition variable are waiting for some condition. Threads that signal condition variables change that condition. For example, a thread might be waiting for some data to arrive. Some other thread might notice that the data has arrived. "The data has arrived" is the condition.

Here's the classic use of a condition variable, simplified:

while(1)
{
    pthread_mutex_lock(&work_mutex);

    while (work_queue_empty())       // wait for work
       pthread_cond_wait(&work_cv, &work_mutex);

    work = get_work_from_queue();    // get work

    pthread_mutex_unlock(&work_mutex);

    do_work(work);                   // do that work
}

See how the thread is waiting for work. The work is protected by a mutex. The wait releases the mutex so that another thread can give this thread some work. Here's how it would be signalled:

void AssignWork(WorkItem work)
{
    pthread_mutex_lock(&work_mutex);

    add_work_to_queue(work);           // put work item on queue

    pthread_cond_signal(&work_cv);     // wake worker thread

    pthread_mutex_unlock(&work_mutex);
}

Notice that you need the mutex to protect the work queue. Notice that the condition variable itself has no idea whether there's work or not. That is, a condition variable must be associated with a condition, that condition must be maintained by your code, and since it's shared among threads, it must be protected by a mutex.

Solution 4

Not all condition variable functions require a mutex: only the waiting operations do. The signal and broadcast operations do not require a mutex. A condition variable also is not permanently associated with a specific mutex; the external mutex does not protect the condition variable. If a condition variable has internal state, such as a queue of waiting threads, this must be protected by an internal lock inside the condition variable.

The wait operations bring together a condition variable and a mutex, because:

  • a thread has locked the mutex, evaluated some expression over shared variables and found it to be false, such that it needs to wait.
  • the thread must atomically move from owning the mutex, to waiting on the condition.

For this reason, the wait operation takes as arguments both the mutex and condition: so that it can manage the atomic transfer of a thread from owning the mutex to waiting, so that the thread does not fall victim to the lost wake up race condition.

A lost wakeup race condition will occur if a thread gives up a mutex, and then waits on a stateless synchronization object, but in a way which is not atomic: there exists a window of time when the thread no longer has the lock, and has not yet begun waiting on the object. During this window, another thread can come in, make the awaited condition true, signal the stateless synchronization and then disappear. The stateless object doesn't remember that it was signaled (it is stateless). So then the original thread goes to sleep on the stateless synchronization object, and does not wake up, even though the condition it needs has already become true: lost wakeup.

The condition variable wait functions avoid the lost wake up by making sure that the calling thread is registered to reliably catch the wakeup before it gives up the mutex. This would be impossible if the condition variable wait function did not take the mutex as an argument.

Solution 5

I do not find the other answers to be as concise and readable as this page. Normally the waiting code looks something like this:

mutex.lock()
while(!check())
    condition.wait(mutex) # atomically unlocks mutex and sleeps. Calls 
                          # mutex.lock() once the thread wakes up.
mutex.unlock()

There are three reasons to wrap the wait() in a mutex:

  1. without a mutex another thread could signal() before the wait() and we'd miss this wake up.
  2. normally check() is dependent on modification from another thread, so you need mutual exclusion on it anyway.
  3. to ensure that the highest priority thread proceeds first (the queue for the mutex allows the scheduler to decide who goes next).

The third point is not always a concern - historical context is linked from the article to this conversation.

Spurious wake-ups are often mentioned with regard to this mechanism (i.e. the waiting thread is awoken without signal() being called). However, such events are handled by the looped check().

Share:
84,515

Related videos on Youtube

ELLIOTTCABLE
Author by

ELLIOTTCABLE

That one.

Updated on February 10, 2021

Comments

  • ELLIOTTCABLE
    ELLIOTTCABLE over 3 years

    I’m reading up on pthread.h; the condition variable related functions (like pthread_cond_wait(3)) require a mutex as an argument. Why? As far as I can tell, I’m going to be creating a mutex just to use as that argument? What is that mutex supposed to do?

  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    So… is there a reason for me to not just leave the mutex always-unlocked, and then lock it right before waiting, and then unlock it right after waiting finishes?
  • nos
    nos about 14 years
    'do something ' shouldn't be inside the while loop. You'd want your while loop to just check the condition, otherwise you might also 'do something' if you get a spurious wakeup.
  • Hasturkun
    Hasturkun about 14 years
    The mutex also solves some potential races between the waiting and signalling threads. as long as the mutex is always locked when changing the condition and signalling , you'll never find yourself missing the signal and sleeping forever
  • paxdiablo
    paxdiablo about 14 years
    Well, yes, you need to check error condition, I'd think that would go without saying. But, assuming there were none, you would have the mutex and it would be safe to "do something". I'll clarify.
  • nos
    nos about 14 years
    no, error handling is second to this. With pthreads, you can be woken up, for no apparent reason(a spurious wakeup) , and with out any error. Thus you need to recheck 'some condition' after you're woken up.
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    I’m not sure I understand. I had the same reaction as nos; why is do something inside the while loop?
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    So… I should first wait-on-mutex on the conditionvar’s mutex, before waiting on the conditionvar? I’m not sure I understand at all.
  • paxdiablo
    paxdiablo about 14 years
    Because that's when you've been signalled with the condition variable. nos is right that your thread can wake up with no work to be done (it was never spurious by the way, what would happen is that it was possible for two threads to be wakened within their cond_wait then one would return with the mutex and do the work, then when it rewaited on the condition, the second would return and no work would be there for it). I consider that an error condition hence my changes. Obviously further clarification is needed.
  • paxdiablo
    paxdiablo about 14 years
    Perhaps I'm not making it clear enough. The loop is not to wait for work to be ready so you can do it. The loop is the main "infinite" work loop. If you return from cond_wait and the work flag is set, you do the work then loop around again. "while some condition" will only be false when you want the thread to stop doing work at which point it will release the mutex and most likely exit.
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    Ahhhhhhhhhhh I see. Thanks for the clarification, that was a little ambiguous. +1’d now, though you should edit it and make that clearer.
  • Judge Maygarden
    Judge Maygarden about 14 years
    Shouldn't the while (some_data != NULL) be a do-while loop so that it waits for the condition variable at least once?
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    A new problem, since your last edit: Um, you lock/unlock the mutex in the consumer outside the work loop. That means, except when it’s blocked due to the condvar-wait, the mutex would always be locked… so how could you have multiple consumers? Shouldn’t, then, the lock/unlock be inside the loop?
  • nos
    nos about 14 years
    No. What you're really waiting for, is for 'some_data' to be non-null. If it is non-null the "first time", great, you're holding the mutex and can safely use the data. If you had a do/while loop you would miss the notification if someone signalled the condition variable before you waited on it (it's nothing like the events found on win32 which stay signalled until someone waits for them)
  • paxdiablo
    paxdiablo about 14 years
    No, the cond_wait unlocks the mutex automatically and re-locks it before returning. While the thread is within the cond_wait call, it does not have the mutex locked.
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    Yes, but I’m saying that that code would prevent two ‘worker threads’ from being active at once, working on different elements of the posited queue of work-to-be-done. Right?
  • paxdiablo
    paxdiablo about 14 years
    Yes it would (sorry, I misunderstood your last comment), the example is a clear multi-producer, single-consumer model. It's easy enough to move to a multi-consumer option if you copy the work items and release the mutex before doing the work, claiming it again afterwards. That's just a minor mod.
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    Another picky bit about your latest modified psuedocode: Unless I’m mistaken, it seems that only one thread can ever be waiting on the condvar at a given time (the others will be locked against the mutex, before beginning to wait on the condvar); doesn’t that sort of defeat the point? i.e. having multiple threads waiting on the condvar, and then pthread_cond_signal() ing against it, to cause just one of those to wake up and take a piece of work.
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    I’m curious what you think of my implementation so far. Since you can’t exactly paste code in comments: gist.github.com/390498
  • paxdiablo
    paxdiablo about 14 years
    @elliot, while a thread is within the condvarwait, it doesn't have the mutex locked (see the third paragraph in my answer). So your contention that other threads would be waiting on the mutex is not correct. It wouldn't matter even if that were the case since, in the multi-consumer model, the thread would unlock the mutex as soon as it began the work (after copying to thread-local storage), allowing another thread to enter condvarwait. But that isn't actually the case (as explained at the start of this comment) so it doesn't matter.
  • paxdiablo
    paxdiablo about 14 years
    Re the code review, I suggest you just try it under load. If you find a problem, I'd be happy to look at any specifics then. I'm happy to help out with specific questions and problems but my day job unfortunately precludes me from large-effort code reviews. Anyway, I hate code reviews almost as much as I hate documentation :-)
  • ELLIOTTCABLE
    ELLIOTTCABLE about 14 years
    No problem, man. You’ve already been a huge, huge help. I was just trying to figure out if my application of the mutex/condvar pattern was, you know, ‘correct’ (for a given definition of ‘correct.’)
  • caf
    caf about 12 years
    I'm curious how you know that you've never recieved a spurious wakeup since, as you say, the same code that handled the erroneous case also handled genuine spurious wakeups as well.
  • paxdiablo
    paxdiablo about 12 years
    Possibly because, early in the development, we assumed that all wakeups were genuine and coded for that. Even though that case always held, we may have added the spurious check later. That'd be my best guess, you're stretching my memory a bit though :-)
  • David Schwartz
    David Schwartz over 11 years
    @elliottcable: Without holding the mutex, how could you know whether you should or shouldn't wait? What if what you're waiting for just happened?
  • stefaanv
    stefaanv almost 11 years
    I just stumbled on this question and frankly, it is weird to find that this answer, which is just correct has so much less points than paxdiablo's answer which has definite flaws (atomicity is still needed, the mutex is only needed for handling the condition, not for handling or notifying). I guess that is just how stackoverflow works...
  • paxdiablo
    paxdiablo over 10 years
    @stefaanv, if you'd like to detail the flaws, as comments to my answer so I see them in a timely fashion, rather than months later :-), I'll be happy to fix them. Your brief phrases don't really give me enough detail to work out what you're trying to say.
  • stefaanv
    stefaanv over 10 years
    Just because you asked, the flaws: 1) as already corrected now, the mutex is still to protect the condition variable, there is no other way to protect it. 2) there should not be a different way to handle a single and multiple consumer, as given in the answer of nos, where the handling is done outside the lock.
  • stefaanv
    stefaanv over 10 years
    3) It isn't necessarily a better solution if there aren't spurious wakeups, there might be a more efficient way to handle condition variables that just happen to generate a spurious wakeup, your code just shouldn't care. 4) speaking of efficiency, you might want to unlock before signalling the condition variable, especially if the reciever has higher priority.
  • paxdiablo
    paxdiablo over 10 years
    I don't really see any of those as flaws. (1) is corrected as you state, (2) is just two ways of doing things, the multi-consumer handles both cases fine but you can use simpler code for a single consumer (no need for local storage copy), (3) is irrelevant since there are spurious wakeups possible and (4) it's moot since signaller has to perform two ops anyway before waiter can be scheduled. So I think I'll leave the answer as-is. If you think that makes it genuinely non-useful, that's what the voting system is for. But I don't, obviously. Thank you for clarifying, I do appreciate it.
  • Eric Z
    Eric Z over 10 years
    @nos, shouldn't while(some_data != NULL) be while(some_data == NULL)?
  • WhozCraig
    WhozCraig over 10 years
    @stefaanv "the mutex is still to protect the condition variable, there is no other way to protect it" : the mutex is not to protect the condition variable; it is to protect the predicate data, but I think you know that from reading your comment that followed that statement. You can signal a condition variable legally, and fully supported by implementations, post-unlock of the mutex wrapping the predicate, and in fact you'll will relieve contention in doing so in some cases.
  • Arun
    Arun almost 8 years
    @WhozCraig, +1, yes, the mutex is NOT to protect the condition variable.
  • David Schwartz
    David Schwartz over 7 years
    Or, to put it more concisely, the entire point of condition variables is to provide an atomic "unlock and wait" operation. Without a mutex, there would be nothing to unlock.
  • Youda008
    Youda008 over 7 years
    What's the cause of the spurious wakeups? If it was guaranteed, that thread will only wake by signal, would the mutex be still needed?
  • xvan
    xvan about 7 years
    Could you provide reference that broadcast operations do not require to acquire the mutex? On MSVC the broadcast is ignored.
  • Kaz
    Kaz about 7 years
    @xvan The POSIX pthread_cond_broadcast and pthread_cond_signal operations (that this SO question is about) do not even take the mutex as an argument; only the condition. The POSIX spec is here. The mutex is only mentioned in reference to what happens in the waiting threads when they wake up.
  • Catskul
    Catskul over 6 years
    @WhozCraig can you point to documentation that might indicate this. I've run into lots of conflicting claims and this answer is being referenced in other places to support the idea that the mutex protects the condition variable itself.
  • Catskul
    Catskul over 6 years
    @youda008 the reason for the spurious wakeups is that for the kernel's implementation has a race condition. While it would be possible to eliminate the race condition, it would have performance costs to do so. Therefore the standard allows for spurious race conditions to allow for the best possible performance from the kernel.
  • Catskul
    Catskul over 6 years
    @paxdiablo do you have a source for the claim "The mutex is used to protect the condition variable itself." I've found indirect contradictory evidence here: linux.die.net/man/3/pthread_cond_wait under the heading Features of Mutexes and Condition Variables
  • paxdiablo
    paxdiablo over 6 years
    @Catskul, just my knowledge of the actual implementation that was used in HPUX many moons ago. It's basically why the condvar and mutex were bound so tightly. Whether that's the case with modern implementations, I couldn't say.
  • Soner from The Ottoman Empire
    Soner from The Ottoman Empire over 5 years
    Would you mind explaining the meaning of stateless?
  • Soner from The Ottoman Empire
    Soner from The Ottoman Empire over 5 years
    Would you mind explaining the meaning of stateless?
  • Kaz
    Kaz over 5 years
    @snr A stateless synchronization object doesn't remember any state related to signaling. When signaled, if something is waiting on it now, it is woken up, otherwise the wakeup is forgotten. Condition variables are stateless like this. The necessary state to make synchronization reliable is maintained by the application and protected by the mutex that is used in conjunction with the condition variables, according to correctly written logic.
  • David Schwartz
    David Schwartz over 5 years
    @snr They don't have any state. They aren't "locked" or "signaled" or "unsignaled". So it is your responsibility to keep track of whatever state is associated with the condition variable. For example, if the condition variable lets a thread know when a queue becomes non-empty, it must be the case that one thread can make the queue non-empty and some other thread needs to know when the queue becomes non-empty. That is shared state, and you must protect it with a mutex. You can use the condition variable, in association with that shared state protected by a mutex, as the wakeup mechanism.
  • guan boshen
    guan boshen almost 5 years
    I think there is something wrong in the answer. I dont think cond-ready and re-acquire mutex is atomic. It is unlock mutex and wait cond that is an atomic operation.
  • Abhilash anand
    Abhilash anand almost 4 years
    However, are race conditions really a problem in a producer consumer scenario. I believe there is no check and act pattern here. In scenario 2, even if another thread has a chance to grab the mutex in between, it will just write at the end of the queue ( or a consumer will consume from the beginning ). The end result would be the same. What am I missing?
  • Validus Oculus
    Validus Oculus almost 4 years
    This is the true answer.
  • ch271828n
    ch271828n over 3 years
    Great explanation, thanks! Also makes me clear why we should do that "extra" check