How exactly does the wait function work wrt to condition variables

Question

Background

I am somewhat confused about my understanding of how condition variables operate in conjunction with concurrent access to shared data. The following is pseudo code to depict my current issue.

// Thread 1: Producer
void cakeMaker()
{
    lock(some_lock);
    while(number_of_cakes == MAX_CAKES)
        wait(rack_has_space);

    number_of_cakes++;

    signal(rack_has_cakes);
    unlock(some_lock);
}

// Thread 2: Consumer
void cakeEater()
{
    lock(some_lock);
    while(number_of_cakes == 0)
        wait(rack_has_cakes);

    number_of_cakes--;

    signal(rack_has_space);
    unlock(some_lock);
}

Let's consider the scenario where the value of number_of_cakes is 0. As a result, Thread 2 is blocked at wait(rack_has_cakes). When Thread 1 runs and increments the value of number_of_cakes to 1, it signals rack_has_cakes. However, Thread 2 wakes up before Thread 1 releases the lock on some_lock, causing it to go back to sleep and miss the signal.

I am unclear about the operation of wait and signal. Are they like a toggle switch that gets set to 1 when signal is called and 0 when wait succeeds? Can someone explain what is happening behind the scenes?

Question

Can someone walk me through one iteration of the above code step-by-step, with a strong emphasis on the events that occur during the signal and wait method calls?

@user3386109 Care to elaborate on a more detailed response with explanations? — AlanSTACK, Dec 14 '17 at 02:04

score 13 · Accepted Answer · answered Dec 14 '17 at 02:08

thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes back to sleep again and the signal has been missed.

No, that's not how it works. I will use C++ std::condition_variable for my cite, but POSIX threads, and most run-of-the-mill implementation of mutexes and condition variables work the same way. The underlying concepts are the same.

Thread 2 has the mutex locked, when it starts waiting on a condition variable. The wait() operation unlocks the mutex and waits on the condition variable atomically:

Atomically releases lock, blocks the current executing thread, and adds it to the list of threads waiting on *this.

This operation is considered "atomic"; in other words, indivisible.

Then, when the condition variable is signaled, the thread re-locks the mutex:

When unblocked, regardless of the reason, lock is reacquired and wait exits.

The thread does not "go back to sleep" before the other thread "calls unlock". If the mutex has not yet been unlocked: when the thread wakes up upon being signaled by a condition variable, the thread will always wait until it succeeds in locking the mutex again. This is unconditional. When wait() returns the mutex is still locked. Then, and only then, the wait() function returns. So, the sequence of events is:

One thread has the mutex locked, sets some counter, variable, or any kind of mutex-protected data to the state that the other thread is waiting for. After doing so the thread signals the condition variable, and then unlocks the mutex at its leisure.
The other thread has locked the mutex before it wait()s on the condition variable. One of wait()'s prerequisites is that the mutex must be locked before wait()ing on the linked condition variable. So, the wait() operation unlocks the mutex "atomically". That is, there is no instance when the mutex is unlocked, and the thread is not yet waiting on the condition variable. When wait() unlocks the mutex, you are guaranteed that the thread will be waiting, and it will wake up. You can take it to the bank.
Once the condition variable is signaled, the wait()ing thread does not return from wait() until it can re-lock the mutex. Having received a signal from the condition variable is just the first step, the mutex must be locked again, by thread, in the final step of the wait() operation. Which, of course, only happens after the signaling thread unlocks the mutex.

When a thread gets signaled by a condition variable, it will return from wait(). But not immediately, it must wait until the thread locks the mutex again, however long it takes. It will not go "back to sleep", but wait until it has the mutex locked again, and then return. You are guaranteed that a received condition variable signal will cause the thread to return from wait(), and the mutex will be re-locked by the thread. And because the original unlock-then-wait operation was atomic, you are guaranteed to receive the condition variable signal.

Hey thanks for the reply. It really helped clear things up for me. However, I still have some concerns. In the comments. Some guy with user(bunch_of_digits) here suggested that my code is wrong and that each condition variable needs its own mutex. (I only have 1 mutex, and 2 condition variables based on it) Is it wrong? — AlanSTACK, Dec 14 '17 at 02:11
No, I do not think this is wrong. You are not required to have a dedicated mutex; furthermore you can even go the other way: use a single condition variable. The condition variable's definition can simply be "the number of cakes has changed". In both cases, you're using a condition variable to wait until the number of cakes meets a specific condition. When "the number of cakes has changed", you check again if this specific condition is now met. The End. Job's done. The fact that the new "number of cakes" may not meet the specific condition -- it's ok, you just continue to wait until it does. — Sam Varshavchik, Dec 14 '17 at 02:17
@SamVarshavchik So it could happen that the thread gets woken up, but a different thread acquires the lock, also calls notify_one, and this notify is then essentially lost? Meaning that for such scenarios, you will have to keep track of this count, if it's important to you that a notification corresponds to ending the wait of exactly one waiting thread? (If there is any waiting thread in the first place) — Jake1234, Jun 21 '22 at 05:20
What does "essentially lost" mean, @jake1234? The thread will eventually return from `wait()` and relock the mutex, in response to the condition variable getting signaled. Nothing is "lost". — Sam Varshavchik, Jun 21 '22 at 10:58
@SamVarshavchik Lost in the sense that a second notify doesn't change anything, because the first ``notify_one`` already put the thread out from the waiting queue. — Jake1234, Jun 21 '22 at 18:08
You have never any guarantees, whatsoever, that condition variable notifications by themselves have stringent semantics. In fact, a thread that's waiting on a condition variable can wake up without the condition variable being signaled. It is called a "spurious wakeup". This is a part of the C++ standard. You never had any guarantees that an execution thread will wake up only in response to a condition variable notification. A spurious, extra, wakeup can always happen. Any expectation of strict notify/wakeup semantics will always fail. — Sam Varshavchik, Jun 21 '22 at 21:29

score 0 · Answer 2 · answered Jan 26 '21 at 23:04

Lets say we currently have number_of_cakes = 0, so Thread 2 is currently stuck on wait(rack_has_cakes). Thread 1 runs and increments number_of_cakes by 1. Then it calls signal(rack_has_cakes) - this wakes up Thread 2, unfortunately Thread 2 wakes up before Thread 1 calls unlock(some_lock), so it goes back to sleep again and the signal has been missed.

You are right, that might be happens, because your signal command order was not correct. In both Producer and Consumer, you have set the following order of commands:

signal(rack_has_cakes);
unlock(some_lock);

But the order should be:

unlock(some_lock);
signal(rack_has_cakes);

You first have to unlock the mutex and then signal the other thread. Since signal command is condition variable wait() and signal() commands are thread safe, you should not worry about releasing the lock before.

But this step is very important as it give the other thread a chance to lock the mutex.

How exactly does the wait function work wrt to condition variables

Background

Question

2 Answers2

Linked