Lab Solution 4: assign3 Redux and Threads
Get Started
Clone the lab starter code by using the command below. This command creates a lab4
directory containing the project files.
git clone /afs/ir/class/cs110/repos/lab4/shared lab4
Next, pull up the online lab checkoff and have it open in a browser so you can jot things down as you go.
Problem 1: assign3
Redux
est. 30min.
Here are a collection of short answer questions about subprocess
, trace
and farm
. These questions are here to force you to think big picture and understand the systems concepts we feel are important.
- Your implementation of
subprocess
required two pipes — one to foster a parent-to-child communication channel, and a second to foster a child-to-parent communication channel. Clearly explain why a single pipe shouldn't be used to support both communication channels. - You may have used
dprintf
in assign3, and it might have contributed to your farm implementation. Explain why there’s adprintf
function, but there's no analogousdscanf
function. Hint: think about whydprintf(fd, %s %d\n", str, i)
would be easy to manage whereasdscanf(fd, "%s %d\n", str, &i)
wouldn't be. Read the first few lines of theman
pages for the traditionalfprintf
andfscanf
functions to understand how they operate. - Consider the implementation of
spawnAllWorkers
below. Even though it rarely causes problems, the line in bold italics technically contains a race condition. Briefly describe the race condition, and explain how to fix it.
struct worker { worker() {} // This is defining a constructor for worker using something called an initialization list. worker(const char *argv[]) : sp(subprocess(const_cast(argv), true, false)), available(false) {} subprocess_t sp; bool available; }; static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN); static vector workers(kNumCPUs); static size_t numWorkersAvailable = 0; static const char *kWorkerArguments[] = { "./factor.py", "--self-halting", NULL }; static void spawnAllWorkers() { cout << "There are this many CPUs: " << kNumCPUs << ", numbered 0 through " << kNumCPUs - 1 << "." << endl; for (size_t i = 0; i < workers.size(); i++) { workers[i] = worker(kWorkerArguments); assignToCPU(workers[i].sp.pid, i); // implementation omitted, irrelevant } } int main(int argc, char *argv[]) { signal(SIGCHLD, markWorkersAsAvailable); // markWorkersAsAvailable is correct spawnAllWorkers(); // other functions, assume all correct return 0; }
- While implementing the
farm
program, you were expected to implement agetAvailableWorker
function to effectively block until at least one worker was available. One possible solution could rely on a helper function calledwaitForAvailableWorker
, which we present below.
static sigset_t waitForAvailableWorker() { sigset_t existing, additions; sigemptyset(&additions); sigaddset(&additions, SIGCHLD); sigprocmask(SIG_BLOCK, &additions, &existing); while (numWorkersAvailable == 0) sigsuspend(&existing); return existing; } static size_t getAvailableWorker() { sigset_t existing = waitForAvailableWorker(); size_t i; for (i = 0; !workers[i].available; i++); assert(i < workers.size()); numWorkersAvailable--; workers[i].available = false; sigprocmask(SIG_SETMASK, &existing, NULL); // restore original block set return i; }
- After analyzing this possible solution, answer the following questions:
- Assuming no signals are blocked at the time
waitForAvailableWorker
is called, clearly identify whenSIGCHLD
is blocked and when it is not. - Had we accidentally passed in
&additions
to thesigsuspend
call instead of&existing
,farm
could have deadlocked. Explain why. - Had we accidentally omitted the
sigaddset
call and not blockedSIGCHLD
,farm
could have deadlocked. Explain how. - In past quarters, we have seen some implementations that lift the block on
SIGCHLD
before the two lines in bold instead of after. As it turns out, executingnumWorkersAvailable--
after the block is lifted can cause problems, but executingworkers[i].available = false
actually can't. Explain why the placement of the--
is more sensitive to race conditions than the Boolean assignment is.
- Assuming no signals are blocked at the time
- Someone proposes an alternate solution using the
pause
function instead, and provides the alternate implementation below ofwaitForAvailableWorker
. The zero-argumentpause
function doesn’t alter signal masks likesigsuspend
does; it simply halts execution until the process receives any signal whatsoever and any installed signal handler has fully executed. This is conceptually simpler and more easily explained than the version that relies onsigsuspend
, but it’s flawed in a way the previous solution above is not. Describe the problem and whysigsuspend
is necessary.
static sigset_t waitForAvailableWorker() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask, NULL);
while (numWorkersAvailable == 0) {
sigprocmask(SIG_UNBLOCK, &mask, NULL);
pause();
sigprocmask(SIG_BLOCK, &mask, NULL);
}
}
- Your implementation of
trace
relied onptrace
's ability to read system call arguments from registers via thePTRACE_PEEKUSER
command. When a system call argument was a C string, thereadString()
function (shown below) needed to rely on repeated calls toptrace
and thePTRACE_PEEKDATA
option to pull in characters, eight bytes at a time, until a zero byte was included. At that point, the entire\0
-terminated C string could be printed. Was this more complicated than need be? If, after all, the argument register contains the base address of a\0
-terminated character array, why can't you just<<
thechar *
tocout
and rely oncout
to print the C string? (hint: what is true about the memory system with respect to two different processes running at the same time?)
static string readString(pid_t pid, unsigned long addr) {
string str;
while (true) {
long ret = ptrace(PTRACE_PEEKDATA, pid, addr);
const char *beginning = reinterpret_cast<const char *>(&ret);
const char *end = reinterpret_cast<const char *>(memchr(beginning, 0, sizeof(long)));
size_t numChars = end == NULL ? sizeof(long) : end - beginning;
str += string(beginning, beginning + numChars);
if (end != NULL) return str; // contains a '\0'
addr += sizeof(long);
}
}
Solution
- A single pipe shouldn't be used because both child and parent would need to write to fds[1], and there’d be no generic way to codify whether material in the pipe is intended for child or parent, so one could accidentally read what’s intended for the other.
- With
dprintf
, the entire string can be constructed (and its length computed) before the underlyingwrite
calls.dscanf
might need to read an extra character (e.g. the space in "1234 ") to confirm a placeholder like %d has been matched, and there's no way to revert that extra read.fscanf
, on the other hand, is framed in terms ofFILE *
s, and those address data structures that include not only the underlying descriptor, but also a cache of the most recently read block of characters, which can be consulted and used for future calls tofscanf
on the sameFILE *
. - For the question about identifying the race condition, the right hand side constructs a temporary that launches a process that self-halts before content of temporary is copied into
workers[i]
. If that happens, theSIGCHLD
handler is invoked and crawls over an array that doesn’t have the pid yet, and sadness ensues. The solution is to blockSIGCHLD
before the bold, italic line and then unblock after. - For the alternate farm implementation:
- The summary is that
SIGCHLD
isn’t blocked prior to thesigprocmask
call, nor within the call tosigsuspend
. It’s blocked everywhere else. Going into more detail, the first three lines create a singleton mask containing justSIGCHLD
, and the fourth line blocks it. Thewhile
loop test is evaluated while theSIGCHLD
block is in place, which is exactly what we want, because we don't want its evaluation to be interrupted by the execution ofmarkWorkersAsAvailable
. If the test passes, the call tosigsuspend
simultaneously lifts the block onSIGCHLD
(remember:existingmask
is empty) and puts the process to sleep until any signal is received, at which point any installed handler (presumably theSIGCHLD
handler, since we expectSIGCHLD
to be the signal that comes in) executes with high priority. After any handler exits,sigsuspend
returns while simultaneously restoring the mask that was in place beforesigsuspend
was called, and re-evaluates the test again with a block onSIGCHLD
. That process repeats until thewhile
loop test fails, at which pointwaitForAvailableWorker
returns, leaving the block onSIGCHLD
in place. - If we had passed in
&additions
, here's what could happen:numWorkersAvailable == 0
could pass, and thensigsuspend
would force the program to deadlock, as onlySIGCHLD
signals are coming in and capable of changingnumWorkersAvailable
, and they're blocked. - If we had omitted
sigaddset
, here's what could happen:numWorkersAvailable == 0
passes, farm is swapped off the CPU, allkNumCPUs
workers self-halt, allkNumCPUs
SIGCHLD
s are handled by oneSIGCHLD
handler call,farm
descends intosigsuspend
, and no additionalSIGCHLD
s ever arrive to wakefarm
up. - For the
--
, execution of it could be interrupted and confused by execution of the++
within theSIGCHLD
handler. But when the boolean assignment is executed, the relevant worker is halted, so interruption by theSIGCHLD
handler would hop overworkers[i]
, as its pid can't possibly have been returned by the handler'swaitpid
call untilworkers[i]
is continued. - For the
pause
implementation, the program can deadlock. Here's how: afterSIGCHLD
is blocked, all workers become available beforepause
gets called,numAvailableWorkers
becomes its maximum value, main execution flow descends intopause
and no more signals are ever sent to wake it up. - For reading strings in trace, the register of interest contains the address of a C string in the tracee's virtual address space, but
operator<<(ostream& os, const char *str)
prints the C string at that address in the tracer's virtual address space.
- The summary is that
Problem 2: Threads vs. Processes
est. 20min.
Threads and processes have much in common, but are also different in several key ways, which make each one better suited for certain use cases. The questions below explore more about multiprocessing and multithreading in the big picture.
- What does it mean when we say that a process has a private address space?
- What are the advantages of a private address space?
- What are the disadvantages of a private address space?
- What programming directives have we used in prior assignments and discussion section handouts to circumvent address space privacy?
- In what cases do the processes whose private address spaces are being publicized have any say in the matter?
- When architecting a larger program like
farm
orstsh
that relies on multiprocessing, what did we need to do to exchange information across process boundaries? - Can a process be used to execute multiple executables? Restated, can it
execvp
twice to run multiple programs? - Threads are often called lightweight processes. In what sense are they processes? And why the lightweight distinction?
- Threads are often called virtual processes as well. In what sense are threads an example of virtualization?
- Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?
Each thread within a larger process is given a thread id, also called a tid. In fact, the thread id concept is just an extension of the process id. For singly threaded processes, the pid and the main thread's tid are precisely the same. If a process with pid 12345 creates three additional threads beyond the main thread (and no other processes or threads within other processes are created), then the tid of the main thread would be 12345, and the thread ids of the three other threads would be 12346, 12347, and 12348.
- What are the advantages of leveraging the pid abstraction for thread ids?
- What happens if you pass a thread id that isn't a process id to
waitpid
? - What happens if you pass a thread id to
sched_setaffinity
? - What are the advantages of requiring that a thread always be assigned to the same CPU?
In some situations, the decision to rely on multithreading instead of multiprocessing is dictated solely by whether the code to be run apart from the main thread is available in executable or in library form. But in other situations, we have a choice.
- Why might you prefer multithreading over multiprocessing if both are reasonably good options?
- Why might you prefer multiprocessing over multithreading if both are reasonably good options?
- What happens if a thread within a larger process calls fork?
- What happens if a thread within a larger process calls execvp?
Solution
- When we say that a process has a private address space, it means that its range of addresses are, by default, private to the owning process, and that it's impossible for an another arbitrary, unprivileged process to access any of it.
- The advantages of a private address space are the fact that it's private, of course. Most other programs can't accidentally or intentionally examine another process's data.
- The disadvantages of a private address space are the fact that it's private, of course. Address space privacy makes it exceptionally difficult to exchange data with other processes, and works against any efforts to breezily distribute work over multiple CPUs using multiprocessing.
- To circumvent address space privacy, we used
ptrace
in Assignment 3 to allow one process to examine pretty much everything in a second one. Memory maps (viammap
andmunmap
) were used in the parallelmergesort
section example to share large data structures between many processes. Signals and pipes have been used to foster communication between multiple processes in many assignments and section examples. - Regarding a say in the matter, an individual process has very little protection against
ptrace
, which is why it's never a good idea to store sensitive data in the clear within a process, since everything is discoverable. System administrators can disableptrace
for a specific set of executables, or for everything, but they rarely do that for development machines like the myths,rices, or oats. For more information about how system administrators can limit the use ofptrace
system-wide, check out this link. Processes also have little say about pipes, since they're generally set up and used to rewire standard input, standard output, and standard error beforeexecvp
is used to bring a new executable into the space of running processes. For signals, if you don't like them, you can installSIG_IGN
to ignore most of them and/or block them for the lifetime of the executable if you want to. The only signals that can't be ignored or fully blocked areSIGKILL
andSIGSTOP
. Memory mapped segments can be forced onto child processes, but those memory mapped segments are pretty much ignored the instant a virtual address space is cannibalized by anexecvp
call. farm
relied on pipes to forward data to each of the workers, and it relied on SIGTSTP, SIGCHLD, SIGCONT, SIGKILL, and a SIGCHLD handler to manage synchronization issues. And in some sense, farm relied on the execvp's string arguments to influence what each of the workers should be running. stsh relies on pretty much the same thing, albeit for different reasons. stsh relies on signals and signal handlers to manage job control, terminal control transfer to be clear who's responding to keyboard events and any signals associated with them, and pipes to foster communication between neighboring processes in a pipeline.- In general, a process cannot be used to execute multiple executables. A program like
stsh
can fork off additional processes, each of whichexecvp
s, but that's not the same thing. The bottom line is that process spaces can't be used to run multiple executables - they can only be transformed once. - Each thread of execution runs as if it's a miniature program. Threads have their own stack, and they have access to globals, dynamic memory allocation, global data, system calls, and so forth. They're relatively lightweight compared to true processes, because you don't need to create a new process when creating a thread. The thread manager cordons off a portion of the full stack segment for the new thread's stack, but otherwise the thread piggybacks off existing text, heap, and data segments previously establish by fork and execvp. (For more info, check out the man page for
clone
). - Virtualization is an abstraction mechanism used to make a single resource appear to be many or to make many resources appear to be one. In this case, the process is subdivided to house many lightweight processes, so that one process is made to look like many.
- Threads sharing memory is a double-edged sword, as you'll learn with Assignments 5 and 6. Because all threads have access to the same virtual address space, it's relatively trivial to share information. However, because two or more threads might want to perform surgery on a shared piece of data, directives must be implanted into thread routines to ensure data and resources are shared without inspiring data corruption via race conditions, or deadlock via resource contention. That's a very difficult thing to consistently get right.
- For leveraging the PID abstraction for thread ids, to the extent that the OS can view threads as processes, the OS can provide services on a per-thread basis instead of just a per-process one.
- For passing a non-PID thread ID to
waitpid
, this is poorly documented, but the takeaway here is thatwaitpid
does not consider it to be an error if a thread id is passed in.waitpid
's man page includes some information about how it interacts with threads. - For
sched_setaffinity
, it actually works, and informs the OS that the identified thread should only be run on those CPUs. - The advantages of requiring a thread always be assigned to the same CPU are that each CPU typically has a dedicated L1 cache that stores data fetched by instructions run on that CPU. CPU-specific L1 caches are more likely to retain content relevant to the thread if the same thread keeps coming back instead of being arbitrarily assigned to other CPUs.
- We might prefer multithreading over multiprocessing for ease of data exchange and code sharing.
- We might prefer multiprocessing over multithreading for protection, privacy, and insulation from other processes' errors.
- If a thread calls
fork
on Linux, the process space is cloned, but the only surviving thread is the one that called fork. In general, you don't want a single thread of many to callfork
unless you have an exquisite reason to do so. - If a thread calls
execvp
, it transforms the entire process to run a new executable. The fact that the pre-execvp process involved threading is irrelevant once execvp is invoked. It's that drastic.
Problem 3: Threads and Mutexes
if time
Here are some more short answer questions about the specifics of threads and the directives that help threads communicate.
- Is
i--
thread safe on a single core machine? Why or why not? - What is busy waiting? Is it ever a good idea? Does your answer to the good-idea question depend on the number of CPUs your multithreaded application has access to?
- Why must we be careful when passing mutexes around as parameters to pass them by reference?
- What's the multiprocessing equivalent of the
mutex
? - What's the multiprocessing equivalent of
thread::join()
?
Solution
i--
is not thread safe if it expands to two or more assembly code instructions, as it does in x86_64. For a local variable not already in a register, the generated code might look like this:
mov (%rbp), %rax
sub $1, %rax
mov %rax, (%rbp)
If the thread gets swapped off of the processor after the first or second of those two lines and another thread is scheduled and changes i
, the original thread will eventually resume and continue on with stale data. Some architectures are capable of doing in-memory decrement(or, more broadly, a memory-memory instruction) in one instruction, so i--
could compile to one assembly code instruction on some architectures if i
is a global variable, and in a single core where only one thing can happen at a time. But in general, you write code to be portable and architecture-agnostic, so you would need to operate as if i--
, even where i
is a global, is thread-unsafe.
- Busy waiting is consuming processor time waiting for some condition to change, as with
while (numAvailableWorkers == 0) {;}
. In a single CPU machine, it makes sense for a process or thread that would otherwise busy wait to yield the processor, since the condition can't possibly change until some other thread gets the processor and makes changes to the variables that are part of the condition. That's whatsigsuspend
(for processes) andcondition_variable_any::wait
(for threads, which we see later) are for. In a few scenarios where multithreaded code is running on a machine with multiple CPUs, it's okay to busy wait if and only if you know with high probability that another thread is running at the same time (on another CPU) and will invert the condition that's causing the first thread to busy wait. - We must be careful when passing mutexes around as parameters to pass them by reference because if we pass by copy, this makes a copy of the locks, and so the threads (or functions) no longer share a single constraint like we might have originally intended. For example, if we have a lock for accessing a variable passed by reference from the main thread, we want to make sure that each thread gets a reference to the single lock we made, not its own copy, since each thread having its own lock defeats the point of limiting access to one thread at a time!
- For the multiprocessing equivalent of
mutex
, it's not a perfect parallel, but if we had to choose one, it'd be signal masks. We used signal masks to remove the possibility of interruption by signal, which is the only thing that can otherwise introduce a race condition into a sequential program. (Another answer, though not something we talk about in CS110, is theflock
function - if you're curious, investigate it and think about how it could be used to prevent interprocess race conditions on shared memory mapped segments.) - For the multiprocessing equivalent of
thread::join()
, it's likelywaitpid
- both allow us to block and wait for a process or thread to finish, althoughjoin
always acts on a single thread (rather thanwaitpid
's ability to say "wait for any child to finish").
Problem 4: Multithreaded quicksort
if time
quicksort
is an efficient, divide-and-conquer sorting algorithm whose traditional implementation looks like this:
static void quicksort(vector<int>& numbers, ssize_t start, ssize_t finish) {
if (start >= finish) return;
ssize_t mid = partition(numbers, start, finish);
quicksort(numbers, start, mid - 1);
quicksort(numbers, mid + 1, finish);
}
static void quicksort(vector<int>& numbers) {
quicksort(numbers, 0, numbers.size() - 1);
}
The details of how partition
works aren’t important. All you need to know is that a call to partition(numbers, start, finish)
reorders the elements between numbers[start]
and numbers[finish]
, inclusive, so that numbers residing within indices start
through mid - 1
, inclusive, are less than or equal to the number at index mid
, and that all numbers residing in indices mid + 1
through stop
, inclusive, are strictly greater than or equal to the number at index mid
. As a result of this reorganization, we know that, once partition
returns, the number residing at index mid
actually belongs there in the final sort.
What’s super neat is that the two recursive calls to quicksort
can execute in parallel, since the sequences they operate on don’t overlap. In fact, to make sure you get some practice with C++ threads right away, you're going to cannibalize the above implementation so that each call to quicksort
spawns off threads to recursively sort the front and back portions at the same time.
- Descend into your clone of the shared
lab4
directory, and execute the sequentialquicksort
executable to confirm that it runs and passes with flying colors. Then examine the filequicksort.cc
to confirm your understanding ofquicksort
. You can ignore the details of thepartition
routine and just trust that it works, but ensure you believe in the recursive substructure of the three-argumentquicksort
function. - Now implement the
aggressiveQuicksort
function, which is more or less the same as the sequentialquicksort
, except that each of the two recursive calls run in independent, parallel threads. Create standalone threads without concern for any system thread count limits. Ensure that any call toaggressiveQuicksort
returns only after its recursively guided threads finish. Test your implementation to verify it works as intended by typing./quicksort --aggressive
on the command line. - Tinker with the value of
kNumElements
(initially set to 128) to see how high you can make it before you exceed the number of threads allowed to coexist in a single process. You don't need to surface an exact number, as a ballpark figure is just fine. - Leveraging your
aggressiveQuicksort
implementation, implement the recursiveconservativeQuicksort
function so it’s just as parallel, but the second recursive call isn’t run within a new thread; instead, it’s run within the same thread of execution as the caller. Test your implementation to verify it works as intended by typing in./quicksort --conservative
on the command line. - Time each of the three versions by using the
time
utility as you might have done in assignment 3 while testingfarm
. Are the running times of the parallel versions lower or higher than the sequential versions? Are the running times what you expect? Explain.
Solution
static void aggressiveQuicksort(vector<int>& numbers, ssize_t start, ssize_t finish) {
if (start >= finish) return;
ssize_t mid = partition(numbers, start, finish);
thread front(aggressiveQuicksort, ref(numbers), start, mid - 1);
thread back(aggressiveQuicksort, ref(numbers), mid + 1, finish);
front.join();
back.join();
}
static void aggressiveQuicksort(vector<int>& numbers) {
aggressiveQuicksort(numbers, 0, numbers.size() - 1);
}
Note that the ref
is needed to be clear that a reference to numbers
(not a copy) is passed as the first argument to the aggressiveQuicksort
thread routine. You'd think that the compiler could examine the prototype of the thread routine being installed, but it doesn't. It only looks at the prototype of the thread
constructor, which relies on the C++ equivalent of ...
to accept zero or more arguments beyond the thread routine function name.
- While tinkering with
kNumElements
, we tried 2500 and everything seemed to work without crashing. When we went with 5000, the test application aborted with an error message that included "terminate called after throwing an instance of 'std::system_error'", which is the abort error message you'll generally see when too many threads (and therefore, too many subdivisions of the finite stack segment into thread stacks) exist at any one time. 4500 led to a crash, and so did 4250. With 4000, it succeeded, so the actual number is probably somewhere in between 4000 and 4250. In reality, the number of threads permitted to exist at any one time is just above 2000 - actually in our own testing it seems to be 2041. Note thatquicksort
ing an array of 4000 numbers doesn't require 2000 threads exist at any one moment, even if the total number of threads created over the lifetime of the sort is much greater than 2000. Some threads die before other threads are created.
// A hybrid of sequential and aggressive
static void conservativeQuicksort(vector<int>& numbers, ssize_t start, ssize_t finish) {
if (start >= finish) return;
ssize_t mid = partition(numbers, start, finish);
thread front(conservativeQuicksort, ref(numbers), start, mid - 1);
conservativeQuicksort(numbers, mid + 1, finish);
front.join();
}
static void conservativeQuicksort(vector<int>& numbers) {
conservativeQuicksort(numbers, 0, numbers.size() - 1);
}
When timing, the results may surprise you and make you question whether we should ever use threading:
myth61:~/lab4$ echo $SHELL
/bin/bash
myth61:~/lab4$ time ./quicksort --sequential Trial #1000: SUCCEEDED!
real 0m0.025s
user 0m0.020s
sys 0m0.004s
myth61:~/lab4$ time ./quicksort --aggressive Trial #1000: SUCCEEDED!
real 0m14.698s
user 0m0.968s
sys 0m37.452s
myth61:~/lab4$ time ./quicksort --conservative Trial #1000: SUCCEEDED!
real 0m7.271s
user 0m0.400s
sys 0m18.424s
The issue here is that quicksort is what's considered CPU-bound, which is systems speak that means the vast majority of an algorithm's work is computational and requires CPU time to add, compare, move, and so forth. By introducing threading, we do enlist all eight of the myth's cores. But we also introduce an enormous amount of overhead (thread selection, context switches between threads, and so forth) so that the more parallelism we introduce, the longer the algorithm takes. In general, you rarely want the number of threads doing CPU-bound work in parallel to be more than the number of cores (or maybe twice the number of cores). The work that needs to be done using any one of the CPUs is the same regardless of whether one thread does it or 100 threads do it. Rest assured that we'll soon rely on threading to manage I/O- or network-bound algorithms, where the majority of the time is spent sleeping (off of the CPU) and waiting for I/O events. Even in these scenarios, you rarely want the number of threads to be more than a small multiple of your CPU and core count. So forgive this extreme example where we create as many threads as we want to. In general, we don't and won't do that. We just allow it during the initial window where we're just learning about threads.
Checkoff Questions Solutions
- Problem 1a: Explain why a fully functional version of spawnAllWorkers needs to block SIGCHLD if it's to be 100% reliable. We need to account for the possibility that a worker immediately starts working and halts once we spawn it but before we record information about it into our workers vector. In general, we have to consider whether it's problematic if the signal handler interrupts us anywhere in this function.
- Problem 1b: Assuming no signals are blocked at the time the correct version of waitForAvailableWorker is called, clearly identify when SIGCHLD is blocked and when it is not. The summary is that
SIGCHLD
isn’t blocked prior to thesigprocmask
call, nor within the call tosigsuspend
. It’s blocked everywhere else. - Problem 2a: Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?. Advantages include ease of data sharing and coordination, and less work to make an entirely new virtual memory space. Disadvantages include race conditions or corruption with accessing shared data.
- Problem 2b: What are the advantages of requiring a thread always be assigned to the same CPU? The advantages of requiring a thread always be assigned to the same CPU are that each CPU typically has a dedicated L1 cache that stores data fetched by instructions run on that CPU. CPU-specific L1 caches are more likely to retain content relevant to the thread if the same thread keeps coming back instead of being arbitrarily assigned to other CPUs.