Lecture 13: Condition variables and threading recap

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

Traffic Lights

Let’s say we want to run a simulation of cars stopping/continuing at a traffic light.

The light starts off as red, and six cars arrive at the intersection, spaced one second apart:

void TrafficLightSimulation::run() {
    cout << "[main thread] light starts off as red" << endl;
    currentColor = Color::Red;
    
    vector<thread> cars;
    for (int i = 0; i < 6; i++) {
        cars.push_back(thread([i, this]{
            carThread(i);
        }));
        // Wait a second before making the next car arrive at the intersection
        sleep(1);
    }

    ...
}

A traffic light thread changes the light to green after 3 seconds, then back to red after another 3 seconds:

void TrafficLightSimulation::trafficLightThread() {
    sleep(3);
    currentColor = Color::Green;
    cout << oslock << "[traffic light] changed color to green" << endl << osunlock;
    // TODO: wake up any cars that were waiting for the light to become green
    sleep(3);
    currentColor = Color::Red;
    cout << oslock << "[traffic light] changed color to red" << endl << osunlock;
}

The cars wait for the light to turn green, then proceed through the intersection:

void TrafficLightSimulation::carThread(int carId) {
    if (currentColor == Color::Red) {
        // TODO: wait for the light to turn green
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

Choosing a synchronization primitive

How should we make the cars wait for the light to turn green?

This certainly isn’t a case where we would want to use a mutex, because this isn’t a situation where we want only one thread to do something at a time. That also rules out using a semaphore in the rate-limiting pattern.

This also doesn’t seem like any kind of handoff between threads: it’s not like one thread is generating things that another thread should consume. We could do something hacky like put a ball in the bucket for every car that is waiting at the intersection, but how many balls is that? We would need to do more complicated things to keep track of how many cars there are, and that itself would require more complex synchronization.

Instead, we can use a new synchronization primitive to make life easier here.

Condition variables

A condition variable allows you to wait for an arbitrary condition to happen. At its core, a condition variable is just a list of waiting threads:

If you call cv.wait(), you’re put to sleep and added to the waiting list.
cv.notify_one() picks an arbitrary thread from the waiting list and adds it to the ready queue, so that thread can start running again
cv.notify_all() wakes up all waiting threads, moving all threads from the waiting list to the ready queue.

Note that these methods can be called many times in a program. notify_all simply moves all threads from the waiting list to the ready queue. We can have a set of threads wait on the condition variable, then call notify_all to unblock all those threads. Later, more threads can wait on the CV, and then notify_all can be called again to unblock those threads.

CVs in the Traffic Lights code

We can add wait and notify_all calls to the car and traffic light threads (cplayground):

class TrafficLightSimulation {
    private:
    Color currentColor;
    condition_variable_any waitForGreen;
    
    public:
    void run();
    void trafficLightThread();
    void carThread(int carId);
};

void TrafficLightSimulation::trafficLightThread() {
    sleep(3);
    currentColor = Color::Green;
    cout << oslock << "[traffic light] changed color to green" << endl << osunlock;
    waitForGreen.notify_all();
    sleep(3);
    currentColor = Color::Red;
    cout << oslock << "[traffic light] changed color to red" << endl << osunlock;
}

void TrafficLightSimulation::carThread(int carId) {
    if (currentColor == Color::Red) {
        waitForGreen.wait();
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

Whenever a car sees the light is red, it calls waitForGreen.wait() to add itself to the list of waiting threads. When the light turns green, the traffic light calls waitForGreen.notify_all(), waking up any waiting threads (moving them from the list of waiting threads in waitForGreen to the ready queue, so they can start executing again).

Avoiding race conditions

This code has a data race: currentColor could be concurrently accessed/modified by the traffic light thread and the car threads. We need to address that by locking a mutex before accessing currentColor: (cplayground)

class TrafficLightSimulation {
    private:
    Color currentColor;
    mutex currentColorLock;
    condition_variable_any waitForGreen;
    
    public:
    void run();
    void trafficLightThread();
    void carThread(int carId);
};

void TrafficLightSimulation::trafficLightThread() {
    sleep(3);
    {
        lock_guard<mutex> lg(currentColorLock);
        currentColor = Color::Green;
    }
    cout << oslock << "[traffic light] changed color to green" << endl << osunlock;
    waitForGreen.notify_all();
    sleep(3);
    {
        lock_guard<mutex> lg(currentColorLock);
        currentColor = Color::Red;
    }
    cout << oslock << "[traffic light] changed color to red" << endl << osunlock;
}

void TrafficLightSimulation::carThread(int carId) {
    lock_guard<mutex> lg(currentColorLock);
    if (currentColor == Color::Red) {
        waitForGreen.wait();
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

However, this ends up creating deadlock. When a car stops at a red light, it holds currentColorLock while calling waitForGreen.wait(). This means that after 3 seconds, the traffic light thread will get stuck attempting to acquire the mutex, and it will never be able to call notify_all(), which means the car will never be woken up.

No problem! We can simply unlock the lock before going to sleep:

void TrafficLightSimulation::carThread(int carId) {
    lock_guard<mutex> lg(currentColorLock);
    if (currentColor == Color::Red) {
        // Unlock the lock so that other threads can get it while we're sleeping
        currentColorLock.unlock();
        waitForGreen.wait();
        // Acquire the lock again, so that when the lock_guard destructor
        // unlocks, it isn't double-unlocking
        currentColorLock.lock();
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

But this creates another race condition that leads to deadlock. Let’s imagine:

A car arrives at the red light. It goes into the if statement and calls currentColorLock.unlock(), but before it gets a chance to wait on the condition variable, it reaches the end of its time slice and is pulled off the CPU by the OS scheduler.
The traffic light thread starts running and changes the light to green, calling notify_all to wake up any waiting cars.
The original car gets back on the CPU and calls waitForGreen.wait(), adding itself to the list of waiting threads and going to sleep. But it is never woken up, because the traffic light thread already called notify_all, and doesn’t call it again later in the program.

To avoid this race condition, the condition_variable_any::wait function takes a mutex as a parameter, which it unlocks as it goes to sleep, then re-acquires when waking up. It unlocks the lock and goes to sleep atomically, without possibility of interruption from other threads that might cause the above race condition. (This is implemented in the kernel; it’s not possible to execute two lines of code atomically like this in userspace code.)

(cplayground)

void TrafficLightSimulation::carThread(int carId) {
    lock_guard<mutex> lg(currentColorLock);
    if (currentColor == Color::Red) {
        waitForGreen.wait(currentColorLock);
        // implementation does:
        //  * currentColorLock.unlock()
        //  * go to sleep until notified
        //  * currentColorLock.lock()
        // with no possibility of being interrupted in between unlocking
        // and going to sleep
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

There is one very last race condition that we need to deal with here. Let’s imagine this scenario happens:

A car sees the light is red. It calls waitForGreen.wait(currentCarLock), releasing the currentCarLock mutex and going to sleep. Life is good.
The traffic light thread changes the light to green and wakes up any waiting threads. The car thread is moved to the ready queue, but doesn’t start executing right away.
The system happens to be extremely heavily loaded at this time, and the car thread doesn’t actually get a chance to run for a few seconds.
The traffic light thread runs after 3 seconds, changing the light back to red.
The car thread finally gets a chance to run. Having awoken from the condition_variable::wait() call, it proceeds to print “car sees green light, continuing” and speed through the intersection. But the light is now red! That’s not good.

To prevent situations like these, cv.wait() calls should always be in a while loop that ensures that the condition is still true before moving on. It may take some time in between getting unblocked by a notify_all call and actually getting on the CPU, and we need to ensure the condition remains true before doing anything that relys on the condition still being true (e.g. speeding through the intersection).

The final code, with a while loop and currentColorLock passed to cv::wait, looks as follows: (cplayground)

void TrafficLightSimulation::carThread(int carId) {
    lock_guard<mutex> lg(currentColorLock);
    while (currentColor == Color::Red) {
        waitForGreen.wait(currentColorLock);
    }
    cout << oslock << "[car " << carId << "] sees green light, continuing"
        << endl << osunlock;
};

Implementing a semaphore

Recall that you can imagine a semaphore like a bucket of balls, where threads can call wait to take a ball from the bucket (waiting for one to be added if there are none) and can call signal to add balls to the bucket (waking up any thread that was waiting for one to be added).

Semaphores are not data structures, and there is nothing actually stored in the bucket. We only need to store the number of “things” in the bucket, so that we can wait when the bucket is empty, and increment this counter whenever adding anything to the bucket. A basic implementation can start off as something like this: (cplayground)

class semaphore {
    private:
    int count;

    public:
    semaphore(int initialCount = 0): count(initialCount) {};
    void wait();
    void signal();
};

void semaphore::wait() {
    // Wait for there to be something in the bucket

    // Take the thing in the bucket
    count--;
}

void semaphore::signal() {
    // Add something to the bucket.
    count++;

    // If we just went from empty -> nonempty, wake up any
    // threads that were waiting for something to be added
}

First, we need to add a mutex to avoid data races when accessing/modifying count. Then, we should use a condition variable to make semaphore::wait wait when there are no balls in the bucket, and to make semaphore::signal wake up any waiting threads once a ball is added: (cplayground)

class semaphore {
    private:
    int count;
    mutex countLock;
    condition_variable_any bucketNonempty;
}

void semaphore::wait() {
    // Wait for there to be something in the bucket
    lock_guard<mutex> lg(countLock);
    while (count == 0) {
        bucketNonempty.wait(countLock);
    }
    // Take the thing in the bucket
    count--;
}

void semaphore::signal() {
    // Add something to the bucket
    lock_guard<mutex> lg(countLock);
    count++;
    // If we just went from empty -> nonempty, wake up any
    // threads that were waiting for something to be added
    if (count == 1) {
        bucketNonempty.notify_all();
    }
}

Multithreading recap

There are three multithreading synchronization primitives we have covered:

Mutexes are used to ensure that only one thread is doing something at a time. This is most commonly used to prevent data races by ensuring that only one thread is accessing a variable at a time.
Semaphores are conceptually a bucket of balls. They are commonly used in two ways:
- To ensure that only n things are doing something at a time. You can think of the semaphore as a bucket of mutexes, or a bucket of permission slips: the semaphore is initialized with n permission slips. Before doing something, a thread must get a permission slip from the bucket, and it puts it back after it’s done.
- To coordinate handoff between threads. Commonly, the semaphore will be initialized as an empty bucket. When a thread produces something that another thread needs, it puts a ball in the bucket to indicate this. Before processing data, a consumer will wait on the semaphore to wait until data is available to be processed.
Condition variables are used to wait for something to happen. They are the most general and flexible synchronization primitive, but also the hardest to use correctly.

Why care about race conditions?

It’s easy to look at synchronization bugs and think eh, that has such an incredibly low chance of happening… why should I care? There are three reasons I would give to that question:

If your software is used by enough people, your race condition will manifest. Low probability events happen given enough events. “Given the scale that Twitter is at, a one-in-a-million chance happens 500 times a day.” (Del Harvey, 2014)
Bugs are subject to compounding effects. A big application never has just one bug. If an application has 1000 components, and 1 in 10 components has a bug, then there are 100 bugs for users to experience, and even if each has a 0.1% chance of happening, a user will experience at least one of them with roughly 10% probability.
Race conditions are the absolute worst kind of bug to investigate and fix. They’re commonly Heisenbugs, which are bugs that seem to disappear when you’re looking for them. This gets even worse when you imagine a codebase maintained over a long time by many people: you might change one part of the code, which incidentally changes the scheduling/timing of the code, suddenly triggering a race condition that existed all along in a completely unrelated part of the codebase.

The fallacy of “benign data races”

A data race is always a problem, even if you cannot think of a specific way it might cause your code to misbehave.

This code spawns two threads. One thread periodically increments a counter, and the other periodically prints out the value of the counter:

int main() {
    int counter = 0;
    thread t1([&counter]{
        for (int i = 0; i < 20; i++) {
            randomSleep();
            counter++;
        }
    });
    thread t2([&counter]{
        for (int i = 0; i < 20; i++) {
            cout << "Current counter value: " << counter << endl;
            sleep(1);
        }
    });
    t1.join();
    t2.join();
    return 0;
}

Many might think that this code is fine (worst case, if the value is changed while t2 is printing it, then t2 will just print out an old value, and the updated value will be printed next time t2 prints). However, this code is totally broken. The code might work fine today, but it might not run fine tomorrow. Data races are undefined behavior, meaning anything could happen; the above code might print 0 every single time, or it could crash, or it could even theoretically start executing arbitrary code, all depending on how your libraries, compiler, and CPU were designed. If you change the compiler version, change some optimization flags, change whether a variable is allocated on the stack or the heap, or change anything else that doesn’t seem related, everything could suddenly break for no apparent reason.

Code that just happens to work because of how your current compiler or CPU work is not good code. I guarantee you that many companies are currently struggling to debug bizarre bugs in their code when they compile for the new Apple M1 Macs. The aarch64 architecture used in these Macs has a much more relaxed memory model than x86, so code that relied on the implementation details of x86 without using explicit synchronization is suddenly completely broken when compiling for aarch64.

CS 110