Lab Handout 4: assign3 Redux and Threads
The lab checkoff sheet for all students can be found right here.
Mid-Quarter Evaluation
Before starting lab this week, we ask that you take 10-15 minutes to individually fill out the CS110 mid-quarter evaluation. This evaluation is important in letting the Teaching Team know what they're doing well and what they can improve. The evaluation is anonymous, and we appreciate any feedback that you have.
To fill out the mid-quarter evaluation, please visit this link.
Get Started
Go ahead and clone the lab4
folder and run make
; this folder contains a working implementation of problem 4. While you may not get to problem 4 during section, it's useful to play with if you have time.
git clone /usr/class/cs110/repos/lab4/shared lab4
Problem 1: assign3
Redux
est. 30min.
Here are a collection of short answer questions about subprocess
, trace
and farm
. These questions are here to force you to think big picture and understand the systems concepts we feel are important.
- Your implementation of
subprocess
required two pipes — one to foster a parent-to-child communication channel, and a second to foster a child-to-parent communication channel. Clearly explain why a single pipe shouldn't be used to support both communication channels. - You've seen
dprintf
in lecture and in the assign3 handout, and it might have contributed to your farm implementation. Explain why there’s adprintf
function, but there's no analogousdscanf
function. Hint: think about whydprintf(fd, %s %d\n", str, i)
would be easy to manage whereasdscanf(fd, "%s %d\n", str, &i)
wouldn't be. Read the first few lines of theman
pages for the traditionalfprintf
andfscanf
functions to understand how they operate. - Consider the implementation of
spawnAllWorkers
below. Even though it rarely causes problems, the line in bold italics technically contains a race condition. Briefly describe the race condition, and explain how to fix it.
struct worker { worker() {} worker(char *argv[]) : sp(subprocess(argv, true, false)), available(false) {} subprocess_t sp; bool available; }; static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN); static vectorworkers(kNumCPUs); static size_t numWorkersAvailable; static const char *kWorkerArguments[] = { "./factor.py", "--self-halting", NULL }; static void spawnAllWorkers() { cout << "There are this many CPUs: " << kNumCPUs << ", numbered 0 through " << kNumCPUs - 1 << "." << endl; for (size_t i = 0; i < workers.size(); i++) { workers[i] = worker(const_cast (kWorkerArguments)); assignToCPU(workers[i].sp.pid, i); // implementation omitted, irrelevant } } int main(int argc, char *argv[]) { signal(SIGCHLD, markWorkersAsAvailable); // markWorkersAsAvailable is correct spawnAllWorkers(); // other functions, assume all correct return 0; }
- While implementing the
farm
program, you were expected to implement agetAvailableWorker
function to effectively block until at least one worker was available. One possible solution could rely on a helper function calledwaitForAvailableWorker
, which we present below.
static sigset_t waitForAvailableWorker() { sigset_t existing, additions; sigemptyset(&additions); sigaddset(&additions, SIGCHLD); sigprocmask(SIG_BLOCK, &additions, &existing); while (numWorkersAvailable == 0) sigsuspend(&existing); return existing; } static size_t getAvailableWorker() { sigset_t existing = waitForAvailableWorker(); size_t i; for (i = 0; !workers[i].available; i++); assert(i < workers.size()); numWorkersAvailable--; workers[i].available = false; sigprocmask(SIG_SETMASK, &existing, NULL); // restore original block set return i; }
- After analyzing this possible solution, answer the following questions:
- Assuming no signals are blocked at the time
waitForAvailableWorker
is called, clearly identify whenSIGCHLD
is blocked and when it is not. - Had we accidentally passed in
&additions
to thesigsuspend
call instead of&existing
, the farm could have deadlocked. Explain why. - Had we accidentally omitted the
sigaddset
call and not blockedSIGCHLD
, farm could have deadlocked. Explain how. - In past quarters, we have seen some implementations that lift the block on
SIGCHLD
before the two lines in bold instead of after. As it turns out, executingnumWorkersAvailable--
after the block is lifted can cause problems, but executingworkers[i].available = false
actually can't. Explain why the placement of the--
is more sensitive to race conditions than the Boolean assignment.
- Assuming no signals are blocked at the time
- Someone proposes an alterante solution using the
pause
function instead, and provides the alternate implementation below ofwaitForAvailableWorker
. The zero-argumentpause
function doesn’t alter signal masks likesigsuspend
does; it simply halts execution until the process receives any signal whatsoever and any installed signal handler has fully executed. This is conceptually simpler and more easily explained than the version that relies onsigsuspend
, but it’s flawed in a way the previous solution above is not. Describe the problem and whysigsuspend
is necessary.
static sigset_t waitForAvailableWorker() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask, NULL);
while (numWorkersAvailable == 0) {
sigprocmask(SIG_UNBLOCK, &mask, NULL);
pause();
sigprocmask(SIG_BLOCK, &mask, NULL);
}
}
- Your implementation of
trace
relied onptrace
's ability to read system call arguments from registers via thePTRACE_PEEKUSER
command. When a system call argument was a C string, you needed to rely on repeated calls toptrace
and thePTRACE_PEEKDATA
option to pull in characters, eight bytes at a time, until a zero byte was included. At that point, the entire\0
-terminated C string could be printed. Was this more complicated than need be? If, after all, the argument register contains the base address of a\0
-terminated character array, why can't you just<<
thechar *
tocout
and rely oncout
to print the C string?
Problem 2: Threads vs. Processes
est. 20min.
Threads and processes have much in common, but are also different in several key ways, which make each one better suited for certain use cases. The questions below explore more about multiprocessing and multithreading in the big picture.
- What does it mean when we say that a process has a private address space?
- What are the advantages of a private address space?
- What are the disadvantages of a private address space?
- What programming directives have we used in prior assignments and discussion section handouts to circumvent address space privacy?
- In what cases do the processes whose private address spaces are being publicized have any say in the matter?
- When architecting a larger program like
farm
orstsh
that relies on multiprocessing, what did we need to do to exchange information across process boundaries? - Can a process be used to execute multiple executables? Restated, can it
execvp
twice to run multiple programs? - Threads are often called lightweight processes. In what sense are they processes? And why the lightweight distinction?
- Threads are often called virtual processes as well. In what sense are threads an example of virtualization?
- Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?
Each thread within a larger process is given a thread id, also called a tid. In fact, the thread id concept is just an extension of the process id. For singly threaded processes, the pid and the main thread's tid are precisely the same. If a process with pid 12345 creates three additional threads beyond the main thread (and no other processes or threads within other processes are created), then the tid of the main thread would be 12345, and the thread ids of the three other threads would be 12346, 12347, and 12348.
- What are the advantages of leveraging the pid abstraction for thread ids?
- What happens if you pass a thread id that isn't a process id to
waitpid
? - What happens if you pass a thread id to
sched_setaffinity
? - What are the advantages of requiring that a thread always be assigned to the same CPU?
In some situations, the decision to rely on multithreading instead of multiprocessing is dictated solely by whether the code to be run apart from the main thread is available in executable or in library form. But in other situations, we have a choice.
- Why might you prefer multithreading over multiprocessing if both are reasonably good options?
- Why might you prefer multiprocessing over multithreading if both are reasonably good options?
- What happens if a thread within a larger process calls fork?
- What happens if a thread within a larger process calls execvp?
Problem 3: Threads and Mutexes
est. 15min.
Here are some more short answer questions about the specifics of threads and the directives that help threads communicate.
- Is
i--
thread safe on a single core machine? Why or why not? - What is busy waiting? Is it ever a good idea? Does your answer to the good-idea question depend on the number of CPUs your multithreaded application has access to?
- Why must we be careful when passing mutexes around as parameters to pass them by reference?
- What's the multiprocessing equivalent of the
mutex
?
Problem 4: Multithreaded quicksort
if time
quicksort
is an efficient, divide-and-conquer sorting algorithm whose traditional implementation looks like this:
static void quicksort(vector<int>& numbers, ssize_t start, ssize_t finish) {
if (start >= finish) return;
ssize_t mid = partition(numbers, start, finish);
quicksort(numbers, start, mid - 1);
quicksort(numbers, mid + 1, finish);
}
static void quicksort(vector<int>& numbers) {
quicksort(numbers, 0, numbers.size() - 1);
}
The details of how partition
works aren’t important. All you need to know is that a call to partition(numbers, start, finish)
reorders the elements between numbers[start]
and numbers[finish]
, inclusive, so that numbers residing within indices start
through mid - 1
, inclusive, are less than or equal to the number at index mid
, and that all numbers residing in indices mid + 1
through stop
, inclusive, are strictly greater than or equal to the number at index mid
. As a result of this reorganization, we know that, once partition
returns, the number residing at index mid
actually belongs there in the final sort.
What’s super neat is that the two recursive calls to quicksort
can execute in parallel, since the sequences they operate on don’t overlap. In fact, to make sure you get some practice with C++ threads right away, you're going to cannibalize the above implementation so that each call to quicksort
spawns off threads to recursively sort the front and back portions at the same time.
- Descend into your clone of the shared
lab4
directory, and execute the sequentialquicksort
executable to confirm that it runs and passes with flying colors. Then examine the filequicksort.cc
to confirm your understanding ofquicksort
. You can ignore the details of thepartition
routine and just trust that it works, but ensure you believe in the recursive substructure of the three-argumentquicksort
function. - Now implement the
aggressiveQuicksort
function, which is more or less the same as the sequentialquicksort
, except that each of the two recursive calls run in independent, parallel threads. Create standalone threads without concern for any system thread count limits. Ensure that any call toaggressiveQuicksort
returns only after its recursively guided threads finish. Test your implementation to verify it works as intended by typing./quicksort --aggressive
on the command line. - Tinker with the value of
kNumElements
(initially set to 128) to see how high you can make it before you exceed the number of threads allowed to coexist in a single process. You don't need to surface an exact number, as a ballpark figure is just fine. - Leveraging your
aggressiveQuicksort
implementation, implement the recursiveconservativeQuicksort
function so it’s just as parallel, but the second recursive call isn’t run within a new thread; instead, it’s run within the same thread of execution as the caller. Test your implementation to verify it works as intended by typing in./quicksort --conservative
on the command line. - Time each of the three versions by using the
time
utility as you might have done in assignment 3 while testingfarm
. Are the running times of the parallel versions lower or higher than the sequential versions? Are the running times what you expect? Explain.
Icons by Piotr Kwiatkowski