Assessment 2 Practice

Below, you can find links to practice materials - these are full practice exams, but we have noted the questions relevant to the topics on this assessment. Understand that we are trying to give you a sense of what some CS110 exam problems have looked like in the past, but we may not replicate the structure of these practice midterms in the actual assessment. In particular, some of the material on past exams is different from the material from this quarter, and these exams were written for different environments (perhaps closed book, written for longer times - for instance, the final exams were usually 3 hours). Also note that some of the problems from the practice exams have been cannibalized to contribute to the section handouts.

Practice Midterm 1: PDF | Solutions
- Problems 2, 3 and 4d all cover multiprocessing
Practice Midterm 2: PDF | Solutions
- Problems 1 and 2g-j all cover multiprocessing
Practice Midterm 3: PDF | Solutions
- Problems 1, 2c-e and 2g all cover multiprocessing
Practice Midterm 4: PDF | Solutions
- Problems 1, 2, 3c-e all cover multiprocessing
Practice Final 1: PDF | Solutions
- Problem 1 covers multiprocessing
- Problems 2 and 3 cover multithreading
Practice Final 2: PDF | Solutions
- Problems 1 and 2a cover multiprocessing
- Problem 3 and 4c cover multithreading (except 3 uses notify_one, which can be notify_all)
Practice Final 3: PDF | Solutions
- Problems 1, 4b and 4d cover multiprocessing
- Problems 2 and 4f (ignoring notify_one question) cover multithreading

Spring 2019 Midterm, Selected Questions

This midterm was not available in PDF format, so we've included the relevant questions below, all covering multiprocessing.

Question 1c/1d

For this problem, you will write a program that takes stdin and sends the input to two other programs (provided on the command line) as their input. For example, here is a run using sort and wc:

$ ./split-output /usr/bin/sort /usr/bin/wc
this
is
not
sorted   <--user typed ctrl-D after this line 
is
      4       4      19
not
sorted
this

Notice that the output of the two programs is interleaved -- your program will not handle de-interleaving the output, but you will discuss it in Part (d) of this problem.

For the following questions, assume you have a function available, dualEcho, that reads from fdIn and writes to both fdOut1 and fdOut2 until fdIn closes. If either fdOut1 or fdOut2 has a value of -1, it does not write any data to that file descriptor.

/* Function: dualEcho
 * ------------------
 * Writes all the input from fdIn to both fdOut1 and fdOut2
 * @param fdIn - the file descriptor to read from (until EOF)
 * @param fdOut1 - the first output file descriptor (ignore if -1)
 * @param fdOut2 - the second output file descriptor (ignore if -1)
 */
void dualEcho(int fdIn, int fdOut1, int fdOut2);

c) Complete the main function, which starts two programs and sends stdin to both, using the provided dualEcho function. You should not change stdout of either program.

int main(int argc, char *argv[]) {
    if (argc < 3) {
        printf("Usage:\n\t %s prog1 prog2\n", argv[0]);
        return 0;
    }
    char *progA[] = {argv[1], NULL};
    char *progB[] = {argv[2], NULL};

    // your code here

    return 0;
}

int fdsSupplyA[2];
int fdsSupplyB[2];
pid_t pids[2];
pipe2(fdsSupplyA, O_CLOEXEC);
pids[0] = fork();
if (pids[0] == 0) { // child
    // for supply
    // child will read from pipe
    close(fdsSupplyA[1]); // no need to write    
    // duplicate STDIN
    dup2(fdsSupplyA[0], STDIN_FILENO);
    close(fdsSupplyA[0]); // no need for this any more
    execvp(progA[0], progA);
    printf("Execvp for progA failed!\n");
    exit(0);
}
pipe2(fdsSupplyB, O_CLOEXEC);
pids[1] = fork();
if (pids[1] == 0) { // child
    // for supply
    // child will read from pipe
    close(fdsSupplyB[1]); // no need to write   
    // duplicate STDIN
    dup2(fdsSupplyB[0], STDIN_FILENO);
    close(fdsSupplyB[0]); // no need for this any more
    execvp(progB[0], progB);
    printf("Execvp for progB failed!\n");
    exit(0);
}
// parent will write to supply
close(fdsSupplyA[0]); // no need to read
close(fdsSupplyB[0]); // no need to read
dualEcho(STDIN_FILENO, fdsSupplyA[1], fdsSupplyB[1]);
close(fdsSupplyA[1]);
close(fdsSupplyB[1]);
waitpid(pids[0], NULL, 0);
waitpid(pids[1], NULL, 0);

d) As noted in the original question, your program does not handle de-interleaving the two outputs (i.e., one program does not print all of its output before the other program). Let's assume we actually want program A to print all of its data before program B prints its data.

One potential solution (let's call it "Solution A") would be to create two additional pipes and map stdout of each program to the write end of the pipes. Then, your program could read from the read end of each pipe, first reading program A and writing all of its output to the terminal stdout, and then repeating for program B.

Another potential solution ("Solution B") might have the second child (the one that will execvp program B) raise a SIGSTOP on itself before the execvp, and then have the parent process waitpid for program A to finish before sending a SIGCONT signal to program B.

It turns out that pipes have a maximum internal buffer (64KB on current Linux versions), such that write calls will block when a write pipe buffer is at its capacity, and would not unblock until some data has been read from the read buffer. In 100 words or less, answer the following question: Would either Solution A or Solution B presented above work by modifying our program as described? If both would work, explain which option would be better. If only one would work, explain why it succeeds and the other doesn't. If neither would work, explain the underlying problem that affects both. Again, assume you can not predict how much data will be presented to stdin for your program. [2 points]

Neither works. For solution A: If a child produces more than 64KB output its output pipe fills up. The child’s next write() blocks (until parent consumes). Since child is blocked, it cannot read() from its stdin and its pipe buffer fills causing parent to block on write(). Since parent writes its full stdin to the children before consuming from the children's output pipes, parent will get hung indefinitely on write(). For solution B: The self-stopped program will not read from it's stdin. Eventually it's pipe buffer will fill up and then the parent's write() calls will block causing the parent process to hang indefinitely.

Question 2

Consider the following C program and its execution. Assume all processes run to completion, all system and printf calls succeed, and that all calls to printf are atomic. Assume nothing about scheduling or time slice durations.

static pid_t pid; // global
void signalHandler(int signal) {
    if (pid == 0) {
        printf("child!\n");
    } else {
        printf("parent!\n");
        exit(0);
    }
}

int main(int argc, char *argv[]) {
    signal(SIGUSR1, signalHandler);
    for (int i = 0; i < 3; i++) {
        pid = fork();
        if (pid == 0) { // child
            if (i % 2 == 1) {
                raise(SIGUSR1);
            } else {
                kill(getppid(), SIGUSR1);
            }
            return 0;
        }
    }
    printf("I made it here!\n");
    for (int i = 0; i < 3; i++) {
        waitpid(-1, NULL, 0);
    }
    printf("And I made it here!\n");
    return 0;
}

a) Write down all possible outputs from the program.

Six possible outputs:
*******************
I made it here! child!
parent!
*******************
parent! child!
******************
child! parent!
******************
child!
I made it here! parent!
*****************
I made it here! parent!
child!
******************
parent!
******************

b) If the exit 0 line was removed from the signal handler, what are two significant differences that you would always see in the output that you did not see in part (a)?

If exit(0) was removed, there would be another “parent!” printed, and there would also be a “And I made it here!” printed. 

Question 3

Believe it or not, sorting is still an active research field, and computer scientists develop new sorting algorithms frequently. You saw merge sort in lab. Some sorting algorithms are better than others. For this problem, you will write sleep sort, a sorting algorithm literally first posted on an online internet forum.

Here is how the sleep sort algorithm works: the function loops through an unsorted vector of integers, forks a process, and sleeps, in seconds, equal to the integer itself. After the sleep is complete, it prints the number to stdout. Convince yourself that this will, indeed, print out a sorted list of integers, and then convince yourself that it is an incredibly inefficient way to sort a list of integers.

a) Write the sleepsort function below. Ensure that the function properly waits for its children to complete before returning.

void sleepsort(vector<int> numbers) {

void sleepsort(vector<int> numbers) {
    for (int n : numbers) {
        pid_t pid = fork();
        if (pid == 0) {
            sleep(n);
            //usleep(n);
            printf("%d\n", n);
            exit(0);
        } 
    }
    for (size_t i=0; i < numbers.size(); i++) {
        waitpid(-1, NULL, 0);
    } 
}

b) sleep sort is sometimes referred to as "one giant race condition that happens to do something useful". In 100 words or less, explain what is meant by this phrase.

This is a giant race condition because all of the sleeps are happening at the same time, and except for the fact that they are different amounts of seconds, you don’t really know when each one will stop first. Because of the granularity of the sleep() value in seconds, the program will work as expected.

c) A clever online forum member commented that you could use usleep, which sleeps for microseconds instead of seconds, to speed up the sort by a factor of a million, instantly. It would be the world's best performance increase resulting from a single-character change in a program. In practice, though, this doesn't work. In 100 words or less: what would cause this amazing speedup to break the sort?

Possible reasons for usleep() to make the program fail:

The looping will take more than some number of microseconds, and for small integers, there just won’t be enough granularity to have distinct sleep times
Even if the microsecond timing is perfect, there are still very tight race conditions, and things like printf will take enough microseconds to destroy timing granularity.

Question 5

When you type ctrl-C while running a program, a SIGINT ("Interrupt") signal is sent to the program, which usually terminates the program. This is not particularly nice, and the program might be in the middle of something critical. Using a signal handler, we can capture SIGINT, and then set a flag that lets the program know that the user wants to terminate it. If the program checks the flag often, it can handle the termination, but do it gracefully, after it has closed everything down. Here is an example:

static bool okay_to_quit = false;

void sigint_handler(int sig) {
    okay_to_quit = true;
}

int main(int argc, char *argv[]) {
    signal(SIGINT, sigint_handler);
    int fds = open("importantFile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);
    while (true) {
        printf("I'm doing something important!\n");
        dprintf(fds,"Important data.\n");
        sleep(1);
        if (okay_to_quit) break;
    }
    printf("Safely Quitting!\n");
    close(fds);

    return 0;
}

We've now got a relatively safe way to handle SIGINT. But, what if we want to make things even safer? Sometimes a user can accidentally type ctrl-C. But users will very rarely type two ctrl-Cs in a row, one after the other. Modify the signal handler presented above to only set okay_to_quit to true if the user types ctrl-C twice in a 500ms time frame. Assume you have the following function:

/* Function: get_ms_time
 * This function returns the current time in milliseconds.
 * A duration of time in milliseconds can be calculated by
 * calling this function twice, and subtracting the first
 * value from the second.
 */
int get_ms_time();

We have provided you with two static function variables, which persist between calls to the function. Recall from CS107 that a static function variable is declared once and then retains its value for the duration of the program. If the function is called again, the value is the same as it was the last time the function was run.

/* Function: sigint_handler(int sig)
 * Sets the global okay_to_quit variable to true if
 * a user types ctrl-C twice within kTimeThreshold
 * number of milliseconds.
 */
 void sigint_handler(int sig) {
    const int kTimeThreshold = 500; // ms

    // first_sigint: variable to indicate that we have seen
    // the first of two SIGINTs
    static bool first_sigint = false;

    // start: variable to hold the starter time between calls to this function
    static int start;

    // your code here
}
 }

Robust solution (will handle three in a row if the last two are within 500ms of each other):

void sigint_handler(int sig) {
    // start: variable to hold the start time between calls to this function
    static int start;
    if (!first_sigint) {
        first_sigint = true;
        start = get_ms_time();
    } else {
        int end = get_ms_time();
        int duration = (end - start);
        if (duration < kTimeThreshold) {
            okay_to_quit = true;
        } else {
            start = end; 
        }
    }
}

Other solution (will capture two in a row, but not three):

void sigint_handler(int sig) {
    // start: variable to hold the start time between calls to this function
    static int start;
    if (!first_sigint) {
        first_sigint = true;
        start = get_ms_time();
    } else {
        int duration = (get_ms_time() - start);
        first_sigint = false;
        if (duration < kTimeThreshold) {
            okay_to_quit = true;
        }
    }
}

Spring 2019 Final Exam, Selected Questions

This exam was not available in PDF format, so we've included the relevant questions below, along with the topic covered. Solutions to the problems can be found in this PDF.

Question 2 (Multiprocessing)

One of the limitations of using signal handlers to signal processes is that there is no information passed when a signal handler is invoked. Or is there? The function signature for a signal handler is as follows:

typedef void (*sighandler_t)(int);

In other words, all of our signal handlers have had the following form:

void sighandler(int sig) { ... } // sig is the signal number

Therefore, with a little bit of ingenuity, we can use the signal information to help drive the logic of the signal handler, even though there is no shared memory between processes. We will leverage the fact that there are two signals, SIGUSR1 and SIGUSR2 that are for any use our program chooses, and we will use the same signal handler for both signals. If the signal passed in is SIGUSR1 then the logic of the handler will be based on that information, and likewise if the signal is SIGUSR2, we can have different logic.

For this problem, create a "process circle" of length n, where the parent and n-1 children are in the circle. You can think of the circle as a circular linked list, if you'd like. The circle is built such that the first child signals the parent, the second child signals the first child, etc., and the parent signals the last child:

When a SIGUSR1 is sent to any of the processes, that process should print out its PID, an arrow (->), and then signal the next process in the circle. The printing stops after the last process in the circle has printed. In other words, you have to avoid an infinite loop. For example, if process 1 above was signaled with kill(1236, SIGUSR1), the following should print:

1236->1235->1234->1237->

Write the signal handler, usrHandler, and the circle creation function, createProcessCircle.

Notes:

Because of the infinite loop problem, you have to figure out how to ensure that the looping stops after all of the nodes in the circle have printed. This is where the two different signals will be helpful. In other words, how can you tell one node that it will need to stop if it is signaled a second time, and how can you ensure that the other nodes don't stop on the first time?
You may use static variables inside the signal handler to retain state, but you are not allowed to create other global variables.
The parent and each child should end up in a while(true) {...} loop, with a proper sigsuspend command to keep the process off the processor when not receiving signals. You don't need to block any signals, but you should use a sigsuspend properly in the while(true) {... } loop to avoid busy waiting. In other words: once the children have all been launched, they go into a non-busy waiting state, only to be woken up by the signal handler. The parent also ends up in a while(true) {...} loop, and also waits in a non-busy way.
The example above should be able to be run multiple times, with different processes in the circle. In other words, you should be able to generate a circle diagram as above as many times as you wish (although the printing must be finished before trying a different starting process). See the diagram below, which shows two windows. In the larger window, the process circle program has been launched. In the smaller window, we have sent the SIGUSR1 signal three times, generating the output in the larger window:

#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
#include <errno.h>
#include "exit-utils.h"  // exitIf, exitUnless
#include "sleep-utils.h"
#include <stdbool.h>

static pid_t prevPid; // global

static void usrHandler(int sig) {
    // YOUR CODE HERE
}

void createProcessCircle(int circleSize) {
    // YOUR CODE HERE
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        printf("usage:\n\t%s circleSize\n",argv[0]);
        return -1; 
    }
    signal(SIGUSR1, usrHandler);
    signal(SIGUSR2, usrHandler);

    printf("parent pid: %d\n", getpid());
    createProcessCircle(atoi(argv[1]));
    return 0;
}

Question 3 (Multithreading)

You're writing a simulation for moving necessary supplies to support a Mars colony. There are technicians who organize the supplies into a container and load it onto a rocket, astronouts who fly the rockets full of containers to Mars, and martians who unpack the containers. At the beginning of the simulation, one thread is spawned for each technician and one for each astronaut. Threads for the martians are spawned as needed (this process is described below). There is only one launchpad, which is used to load the cargo, and the astronouts (who fly their own rockets) must wait for the launch pad to become available.

A technician is responsible for organizing just one container and getting it onto a rocket (in other words, they perform the loading, as well). First, they must round up an empty container. There are a fixed number of containers available for all technicians to use. Once the technician has a container, they organize it with stuff and then load it. To load, a rocket needs to be at the launch pad and have space for another container. The technician who puts the final container on the rocket tells the astronaut to go. Once their container is loaded, the technician's work is done (and the thread exits).

Each astronaut thread is responsible for making one trip to pick up containers, taking them to Mars, and unloading. The astronaut first "refuels" the rocket and then goes to pick up containers. There is only one launch pad, so only one rocket can be loading at once. The astronaut hangs out while the technicians pile on the containers. Each rocket has a capacity that is established when created. The rocket is considered loaded when it is full to its capacity or it is the last rocket and there are no more containers to load. After the rocket is loaded, the astronaut "flys" the rocket to Mars. On Mars, the astronaut takes each container off the rocket and dispatches a separate martian thread to unpack it. Once the martians are dispatched, the astronaut waits for all spawned martian threads to finish, and only when that happens is the astronaut done (and the astronaut thread exits).

Each martian is responsible for "unpacking" one container. Once the container is unpacked, it is empty and should be made available to the packers who are in need of empty containers. (Assume that each container has a corresponding container that has already made the round trip and as a container is finished on Mars, one is immediately ready back on Earth). Several rockets can be unloaded simultaneously, and martians can work in parallel. Once the martian unpacks their box, their job is done (and their thread exits).

Here is the starting main function for the packing simulation.

const static int kNumContainersToMove = 300;
const static int kNumEmptyContainers = 70;

int main(int argc, char *argv[]) {
  RandomGenerator rgen; 
  vector<thread> threads;
  int totalRocketCapacity = 0;

  for (int i = 0; i < kNumContaineresToMove; i++) {
     // create technician threads
     threads.push_back(thread(technician));
  }

  while (totalRocketCapacity < kNumContainersToMove) { 
    // create astronaut threads
    int thisRocketHolds = rgen.getNextInt(10, kNumEmptyContainers);
    totalRocketCapacity += thisRocketHolds;
    threads.push_back(thread(astronaut, thisRocketHolds));
  }

  for (thread& t: threads) t.join();
  return 0;
}

// simulation functions already written for you
static void organize(); // for technician, to fill container with stuff
static void refuel();   // for astronaut, to prepare rocket
static void fly();      // for astronaut, fly to Mars
static void unpack();   // for martian, to unload container contents

Assume the above helpers are already written and are thread-safe, and you can just call them when you need to. Your job will be to write the technician, astronaut, and martian functions to properly synchronize the different activities and efficiently share the common resources.

Declare your global variables, mutexes, and semaphores, and then implement the technician, astronaut, and martian functions. Do not busy wait anywhere!

// declare your global variables here
// YOUR CODE HERE

void technician() {
    // YOUR CODE HERE
}

void astronaut(int capacity) {
    // YOUR CODE HERE
}

void martian() {
    // YOUR CODE HERE
}

Problem 4 (Multithreading a/c/d and General b)

Answer the four questions below. Limit your answers to 150 words or less.

a) Generally, there are two different ways we use semaphores (throughout lecture and the assignments, you have seen semaphores used in many different scenarios, but the way we use semaphores in these two scenarios falls into one of these two usage patterns). Describe these two use cases, and be explicit about how they differ.

b) Describe the difference between an "I/O Bound" program and a "CPU Bound" program. If we were to speed up the CPU on our computer, but it did not speed up the problem we were trying to solve with a program, would the program be I/O Bound or CPU Bound?

c) Our own solution for the ThreadPool class includes "lazy" initialization of worker threads. This means that we do not actually launch any worker threads until they are needed. So, for example, if we create a 64-thread ThreadPool (e.g., ThreadPool tp(64) ), and then we only scheduled ten threads to work, we would only actually create ten worker threads, at the time they are scheduled. As another example, suppose we had written the Farm problem with threads instead of processes. We could launch all farm workers at the beginning (non-lazy), or we could only launch a worker when it is needed, up to a maximum amount of total worker threads allowed (lazy). Explain the benefit of lazy thread initialization, and explain one downside to lazy thread initialization.

d) The following is the general pattern for using a condition variable:

m.lock();
cv.wait(m, []{return condition > 0});
m.unlock();

Explain two reasons why a mutex is necessary for a condition variable to work properly.