Assignment 3: All Things Multiprocessing
Due: Fri Feb 12 11:59 pm
Late submissions accepted until Sun Feb 14 11:59 pm
This assignment was designed by Jerry Cain
We've progressed through a good amount of material with multiprocessing, pipes, and interprocess communication. Rather than building one large program, your next task is to code up a few different things with the idea that you'll learn more by tackling multiple problems and leveraging your understanding of the material in multiple domains. All of these programs should be coded directly within a single repository, which you can get by typing the following:
git clone /usr/class/cs110/repos/assign3/$USER assign3
This project contains sample solutions in the samples folder. When in doubt about expected behavior, try running the sample solutions!
There are four problems in total, and by the end of lecture 7, you'll be outfitted with everything you need to know to tackle the first three - lecture 8 is needed to work on the final problem. We recommend you complete the programs in the listed order.
Debugging Tips
As you work, here are tips for effective debugging and testing. Revisit these as you work!
inspect-fds.py
The provided inspect-fds.py script in the samples folder can help you visualize file descriptors in parent and child processes of your programs, and help debug issues with pipe, dup2, etc. particularly on the first two parts of the assignment. The script can output information about the open file descriptors for any currently-running program on the same myth machine. Here's how to use it:
- run the program you want to inspect in GDB, and have it pause (e.g. via breakpoint) at the location at which you'd like to inspect file descriptors. Alternatively, add a call to
sleep()where you'd like the program to stall to inspect its file descriptors. - open another terminal window and log into the same myth machine you are running your program on (e.g.
ssh SUNET@myth56.stanford.edu) - from your assignment directory, run
python samples/inspect-fds.py <PROGRAM NAME OR PID>. E.g. if you want to inspect a currently-runningpipeline-testprogram, you could runpython samples/inspect-fds.py pipeline-test. - In the output, pipes are labeled and color-coded, so you can identify the read and write ends of the same pipe.
Visualize with Cplayground
This awesome tool is created and maintained by CA Ryan Eberhardt
Cplayground is an online environment where you can run C/C++ programs and investigate their behavior. It's particularly useful for visualizing file descriptors and pipes. When you are working on pipeline or subprocess, consider using Cplayground to visualize open file descriptors and pipes (pipeline playground here, subprocess playground here). In particular, you can use the debugger to pause and step through the program and see how the file descriptors change. To use the debugger, set breakpoints (click the line numbers) where you want to pause. A good approach is to set a breakpoint on fork(), as well as the line after (so that the child process also has a breakpoint just after the fork). Then, run the program, and click the "Open Files" tab. You can step through the code using the "next line" button that appears in the editor, and you'll see the Open Files diagram change! For instance, here is a screenshot of what the diagram should look like by the end of pipeline. And here is a screenshot of what the diagram should look like by the end of subprocess(true, true).
Custom Tests For trace and farm
You can create a custom tests file for additional tests to run with sanity check. These are optional - but they can significantly help with testing! To do this, create a new text file (e.g. called custom_tests). Each line of this file can contain one command to test, which must start with either trace or farm (these are the only two programs that can be tested via sanitycheck). E.g. a sample custom tests file for trace might look like the following:
trace samples/simple-test1
trace samples/simple-test2
trace samples/simple-test3
trace samples/simple-test4
trace samples/simple-test5
To have sanitycheck run your custom tests file instead of the default sanitycheck tests, run tools/sanitycheck <FILENAME> (e.g. tools/sanitycheck custom_tests).
This is useful because, particularly for trace, sanitycheck does some "smart diffing", e.g. counting two parameter addresses as equal even if their values differ across runs. This isn't easily possible if you manually diff the outputs yourself. If you want to use a custom tests command where input is piped in, you should first save the input to a file and then use a custom tests command like farm < nums.txt. Also, unfortunately custom test runs do not account for the nondeterministic orderings of print statements for farm.
Short Etude (Op. 25, No. 9): Implementing pipeline in C
Note: this is a standalone problem that is not a part of any subsequent problems in this assignment (though its concepts return on assignment 4)
Tip: For the first two problems, we recommend checking out pipe's lesser known sibling, pipe2. You can use pipe2 to create a pipe just as pipe does; but if you pass the O_CLOEXEC flag when creating a pipe, then those pipe file descriptors will be automatically closed if the process is consumed by execvp. (In particular: if a process is not consumed by execvp, then you must still close the pipe file descriptors yourself). Type man pipe2 at the prompt and search for O_CLOEXEC for more details.
In pipeline.c, implement the pipeline function, which accepts two argument vectors and creates two child processes where the standard output of the first is directed to the standard input of the second. Here's the required function signature:
void pipeline(char *argv1[], char *argv2[], pid_t pids[]);
You can assume that all calls to pipeline are well-formed and will work as expected. In other words, argv1 and argv2 are each valid, NULL-terminated argument vectors, pids is the base address of an array of length two, and all calls to pipe, dup2, close, execvp, and so forth succeed so that you needn't do any error checking whatsoever (although you may if you think it'll help you arrive at a working solution more quickly). pipeline should return without waiting for either of the child processes to finish (i.e. your pipeline implementation should not call waitpid anywhere), and pipeline should put the ids of the daisy-chained processes into pids[0] and pids[1]. Also, ensure that the two processes are running in parallel as much as possible, so that pipeline({"sleep", "10", NULL}, {"sleep", "10", NULL}, pids) takes about 10 seconds instead of 20.
You can use pipeline-test.c (compiles to pipeline-test) for testing. This file is small, so you should add many more tests of your own to prove that your pipeline is coded to specification.
Short Etude (Op. 25, No. 8): Implementing subprocess in C++
Note: this function is used as part of problem 3 and 4, so you must complete this (and test thoroughly!) before working on later problems.
Implement the subprocess function in subprocess.cc, which is similar to subprocess from lecture but even more flexible. The subprocess.h file contains the following important information:
static const int kNotInUse = -1;
struct subprocess_t {
pid_t pid;
int supplyfd;
int ingestfd;
};
/**
* Function: subprocess
* --------------------
* Creates a new process running the executable identified via argv[0].
*
* argv: the NULL-terminated argument vector that should be passed
* to the new process's main function
* supplyChildInput: true if the parent process would like to pipe content
* to the new process's stdin, false otherwise
* ingestChildOutput: true if the parent would like the child's stdout to be
* pushed to the parent, false otherwise
*/
subprocess_t subprocess(char *argv[],
bool supplyChildInput,
bool ingestChildOutput) throw (SubprocessException);
Read through the subprocess.h documentation to see how this new subprocess should work. If any of the system calls needed to implement your subprocess routine fail (either in the parent or in the child), you should throw a SubprocessException with an actionable error message. Inspect the subprocess-exceptions.h file for the full definition.
You can test with the provided subprocess-test.cc program, which we highly encourage you to add to. You'll want to add your own tests to ensure that all the (true, true), (true, false), (false, true), and (false, false) combinations for (supplyChildInput, ingestChildOutput) all work as expected. When looking at subprocess-test.cc, you'll also get a little bit of a reminder how try/catch blocks work.
Note that this implementation is C++; the two later exercises are also C++ and must link against your subprocess implementation. We will be relying primarily on C++ going forward in CS110, since C++ also provides better support for strings and generics than C, as well as native support for some threading and concurrency directives used in later weeks. However, your C++ implementation of subprocess will look as it would have in pure C, except that you're throwing C++ exceptions to identify errors.
Long Etude (Op. 10, No. 4): Implementing trace in C++
Note: trace is substantially more challenging than pipeline and subprocess, so it requires a detailed specification. To ensure you have everything you need to successfully implement a fully operational trace, we provide a good amount of reading with lots of code. Don't be intimidated by the length of this section. We're just taking care to ensure that everything is spelled out as clearly as possible. Read everything carefully to ensure you're comfortable with the exercise!
The trace utility--like the strace utility seen in section--is a diagnostic tool used to monitor the execution of a command. The trace program spawns off one child process, and "traces" (monitors) it as it executes whatever executable it is told to run. The tracer (parent) will force the tracee (child) to periodically STOP (pause) - specifically, whenever it is about to call a system call (such as open, close, etc.), and whenever it is about to return from a system call. At each of these cases, we can extract information from the child about that system call (in the first case, what the call is, and in the second case, what it returned, all via the tracee's registers) and print it. Then we resume the child process execution. By repeatedly doing exactly this (freeze, extract, resume, repeat), the tracer accumulates a series of CPU snapshots as the tracee oscillates in and out of its system calls. Our trace program will have two "modes" to provide different levels of detail about this system call information. When run in simple mode, trace prints system call opcodes and raw return values. When run in full mode, trace prints the name of each system call, its arguments, and its return value, with additional error information about system calls that fail.
Consider, for example, the following nonsense program (drawn from simple-test5.cc in your assign3 repo).
int main(int argc, char *argv[]) {
write(STDOUT_FILENO, "12345\n", 6);
int fd = open(__FILE__, O_RDONLY);
write(fd, "12345\n", 6);
close(fd);
read(fd, NULL, 64);
close(/* bogusfd = */ 1000);
return 0;
}
A simple trace of the command ./simple-test5 might look like this:
Simple Trace Example Output
$ ./trace --simple ./simple-test5
syscall(59) = 0
syscall(12) = 14434304
// many lines omitted for brevity
syscall(1) = 12345
6
syscall(2) = 3
syscall(1) = -9
syscall(3) = 0
syscall(0) = -9
syscall(3) = -9
syscall(231) = <no return>
Program exited normally with status 0
$
It may look like a bunch of random numbers, but the numbers in parentheses are system call opcodes (59 is the opcode for execve, 12 is for brk, 1 is for write, 2 is for open, 3 is for close, 0 is for read, and 231 is exit_group) and the numbers after the equals signs are return values (that 6 is the number of characters just published by write, the -9's communicate write's, read's, and close's inability to function when handed closed, incompatible, or otherwise bogus file descriptors, and exit_group never returns).
When run in simple mode, these CPU snapshots are printed without much translation. When run in full mode, trace publishes much more detailed information - the register values are translated to full system call names, strongly typed argument lists (e.g., ints, C strings, pointers), and properly interpreted return values:
Full Trace Example Output
$ ./trace ./simple-test5
execve("./simple-test5", 0x7fff725831e0, 0x7fff725831f0) = 0
brk(NULL) = 0xd4b000
// many lines omitted for brevity
write(1, "12345
", 6) = 12345
6
open("simple-test5.cc", 0, 6) = 3
write(3, "12345
", 5) = -1 EBADF (Bad file descriptor)
close(3) = 0
read(3, NULL, 64) = -1 EBADF (Bad file descriptor)
close(1000) = -1 EBADF (Bad file descriptor)
exit_group(0) = <no return>
Program exited normally with status 0
$
You can see the return values, if negative, are always -1. When the return value is -1, the value of errno is printed after that in #define constant form (e.g. EBADF), followed by a specific error message in parentheses (e.g. "Bad file descriptor"). If the return value is nonnegative, then that return value is simply printed and errno is ignored as irrelevant.
Getting Started
You will first implement the "simple" trace mode, and then follow with the implementation of "full" mode. But how do we write code to manipulate the execution of another process? We rely on a special function--itself a system call, as it turns out--called ptrace. A quick glance at ptrace's man page provides the following:
The ptrace system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.
The full prototype of ptrace looks like this:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
The first argument to ptrace is always some constant (e.g. PTRACE_TRACEME, PTRACE_SYSCALL, PTRACE_PEEKUSER, etc.), and the choice of constant determines ptrace's behavior and how many additional arguments are needed.
The code we start you out with does just enough to illustrate how one process can manipulate a second using ptrace. When the initial version of trace.cc is compiled and invoked to profile simple-test5, we see it just prints out information about the first system call:
$ ./trace --simple ./simple-test5
syscall(59) = 0
$
You will build on this, of course, but you can use this code to see how to print out the first of the many CPU snapshots you'll ultimately need to take.
Starter Code Walkthrough
Before writing any code, make sure you truly understand what every line in the starter code does. Let's walk through the starter code in main, starting with the first few lines (note: you can ignore the provided readString function for now, that will become relevant when you start implementing full trace):
int main(int argc, char *argv[]) {
bool simple = false, rebuild = false;
int numFlags = processCommandLineFlags(simple, rebuild, argv);
if (argc - numFlags == 1) {
cout << "Nothing to trace... exiting." << endl;
return 0;
}
The starter version ignores whatever simple and rebuild are set to, even though the code you write will eventually rely on them. The implementation of processCommandLineFlags resides in trace-options.cc, and that implementation parses just enough of the full command line to figure out how many flags (e.g. --simple and --rebuild) sit in between ./trace and the command to be traced. processCommandLineFlags accepts simple and rebuild by reference, updating each independently depending on what command line flags are supplied. Its return value is captured in a variable called numFlags, and that return value shapes how execvp is invoked in the code that follows.
The next several lines spawn off a child process that ultimately executes the command of interest:
pid_t pid = fork();
if (pid == 0) {
ptrace(PTRACE_TRACEME);
raise(SIGSTOP);
execvp(argv[numFlags + 1], argv + numFlags + 1);
return 0;
}
A new process is created via fork, and the child process:
- calls
ptrace(PTRACE_TRACEME)to inform the OS that it's content being monitored by the parent process, - calls
raise(SIGSTOP)and awaits parental permission to continue, and - calls
execvp(argv[numFlags + 1], argv + numFlags + 1)to transform the child process to run the command to be profiled.
We're used to seeing argv[0] and argv as the two arguments, but argv[0] is ./trace. Here, execvp's first argument needs to be the name of the executable to be monitored, and we get to that by looking past the arguments for actual trace. Provided the execvp succeeds, the child process is consumed with a new executable and proceeds through its main function, unaware that it'll be halted every time it makes a system call.
The return 0 at the end is relevant in the event that argv[numFlags + 1] names an executable that either doesn't exist or can't be invoked because of permission issues. We need to ensure the child process ends in the event that execvp fails, else its execution will flow into code designed for the tracer, not the tracee.
Meanwhile, the tracer circumvents the code specific to the child and executes the following lines:
int status;
waitpid(pid, &status, 0);
assert(WIFSTOPPED(status));
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD);
The parent (tracer) process:
- calls
waitpidto halt until the child process has granted permission to be traced and self-halted. - calls
assertto confirm that the child self-halted - calls
ptraceto instruct the operating system to set bit 7 of the signal number--i.e., to deliverSIGTRAP | 0x80--whenever a system call trap is executed.
Next, the tracer tells the tracee to proceed until it's just about to execute a system call:
while (true) {
ptrace(PTRACE_SYSCALL, pid, 0, 0);
waitpid(pid, &status, 0);
if (WIFSTOPPED(status) && (WSTOPSIG(status) == (SIGTRAP | 0x80))) {
int num = ptrace(PTRACE_PEEKUSER, pid, ORIG_RAX * sizeof(long));
cout << "syscall(" << num << ") = " << flush;
break;
}
}
Here's a breakdown of what each line within the while loop does:
ptrace(PTRACE_SYSCALL, pid, 0, 0)continues a stopped tracee until it enters a system call (or is otherwise signaled). Note of interest: when usingwaitpidon a tracee, it always acts as though you specifyWUNTRACEDvia the third parameter, even if you pass in 0- The
waitpid(pid, &status, 0)call blocks the tracer until the tracee halts. - If the tracee stops because it's entering a system call, then the
WIFSTOPPED(status)will certainly producetrue. If the tracee stopped because it's entering a system call, then the tracee would be signaled withSIGTRAP | 0x80, as per the above discussion of whatptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD)does. If either of the two tests&&'ed together fails, then the tracee halted for some other reason, and the tracer resumes the tracee. - Eventually, the tracee stops because it's entering its first system call--that is, the tracee stops just after the system call opcode has been placed in
%rax, any additional arguments are placed in%rdi,%rsi,%rdx,%r10,%r8, and%r9, as needed, and the software interrupt instruction--int 0x80--has been executed. Theptrace(PTRACE_PEEKUSER, pid, ORIG_RAX * sizeof(long))is another flavor ofptrace, and this one extracts the value in%raxjust as the tracee was forced off the CPU. (For reasons we won't go into, the value of%raxis clobbered by the user-to-kernel mode transition, but a pseudo-register%orig_raxpreserves the value, specifically for system call traces like we're managing here.) - For now, we are just going to get the system call opcode - we're ignoring system call parameters. We extract the opcode from
%orig_raxand print it withcout << "syscall(" << num << ") = " << flush.
At this point, we have printed information about what system call is being made. Next, we want to print out what it returns.
while (true) {
ptrace(PTRACE_SYSCALL, pid, 0, 0);
waitpid(pid, &status, 0);
if (WIFSTOPPED(status) && (WSTOPSIG(status) == (SIGTRAP | 0x80))) {
long ret = ptrace(PTRACE_PEEKUSER, pid, RAX * sizeof(long));
cout << ret << endl;
break;
}
}
This loop is extremely similar to the first loop. We tell the tracee to resume, and wait for it to halt again, at which point we know it has just exited the system call. Then we extract the return value from %rax (which hasn't been clobbered--so RAX is correct, not ORIG_RAX), and print it out. Note that ret is a long instead of an int; system call return values can be pointers, so all 64 bits matter. The system call opcode, however, is always small enough to fit in an int, which is why we go with an int instead of a long in the first while loop.
The rest of the starter code is placeholder and should be removed as you develop a fully operational trace. It exists simply to kill the tracee and wait for it to die before allowing the tracer to return.
kill(pid, SIGKILL);
waitpid(pid, &status, 0);
assert(WIFSIGNALED(status));
return 0;
That's why the starter code prints information about the first system call, but none of the others.
Simple Trace
Your first task is to modify and build on the starter code as needed to implement "simple" mode for trace, which prints out the full sequence of system call opcodes and return values, as shown in the very first sample output above for ./trace --simple ./simple-test5. The key challenge is you have no idea how many system calls a monitored process makes, so you'll need to update the code to repeatedly halt on system call enter, exit, enter, exit, and so forth, until you notice that the tracee truly exits in the WIFEXITED sense. As part of this, you should combine the two while loops in the starter code into one, and decompose it into a function that is called from two places (the only reason we replicated the code in the starter trace.cc file was to simplify the discussion of how it works).
Here are some other important notes:
- You may assume that the traced programs are never delivered any non-trap signals and never execute any signal handlers. It wouldn't be that much more difficult to support arbitrary signals and signal handlers, but it'd just be more code that wouldn't add much.
- System calls don't predictably return the same values with each execution, and one system call's return value may be be passed as an argument to another system call down the line. The
sanitycheckand theautograderare sensitive to all this--at least for the system calls relevant to the all of the test programs we've exposed--and use regular expressions that match and replace parameters and return values that are permitted, and even likely, to vary from test run to test run. - Make sure that everything up through and including the equals sign (e.g.
syscall(231) =) is printed and flushed to the console before you allow the tracee to continue. Doing so will help align your output with that of the sample executable, and will make thesanitycheckandautogradertools happy. This is important when the system calls themselves print to the console (e.g.write(STDOUT_FILENO, "12345\n", 6)), so that the system call output interleaves withtrace's output in a predictable way.
Test simple mode thoroughly before moving on - you will add to this code in the next part, full trace.
Full Trace
Next, you will further modify this code to support "full" mode for trace; the overall architecture of the program is largely the same, but more information is printed out, as shown in the second sample output above for ./trace ./simple-test5. There are three new main features: instead of opcodes, you will print function names, you will also print the function parameters, and you will print type-correct return values and appropriate error codes.
Before you continue...
This next part relies on a provided set of files trace-system-calls.h/.cc that gather and provide info about system call function signatures. While you don't need to worry about the implementation, note that it does use your subprocess function to gather information about these function signatures. You should verify that all is good to go before proceeding! One way to do this is via the provided trace-system-calls-test.cc that just prints out the results of calling the functions in these files. For this reason, you can make and run trace-system-calls-test to see the output that is generated. Compare it to the correct output in samples/trace-system-calls-test-correct.txt to see if the functions are working correctly with your subprocess implementation. (The output should list system call numbers, names and signatures for most (but not all) functions.). If it doesn't match, make sure to revisit your subprocess function before proceeding.
Function Names
How do you convert system call opcodes (e.g. 59) to system call names (e.g. execve)? We provide a function compileSystemCallData, documented in trace-system-calls.h, that populates two maps of data, the first of which contains opcode-to-name information, and is guaranteed to include every legitimate opcode as a key. Each key maps to the corresponding system call name, so that 0 maps to read, 1 maps to write, 2 maps to open, 3, maps to close, and so forth. You can rely on the information in this map--which you should only construct if you're running in full mode--to convert the 0's to reads and the 59's to execve's, etc. (Optionally, if you're interested, you can look inside the implementation of compileSystemCallData within trace-system-calls.cc, and you'll see that the map is built by parsing /usr/include/x86_64-linux-gnu/asm/unistd_64.h.)
Function Arguments
How do you print argument lists like those contributing to execve("./simple-test3", 0x7ffdd1531690, 0x7ffdd15316a0)? The same compileSystemData function populates a second map which contains system call signature information. We can use this map to determine that, say, execve takes a C string and two pointers, access takes a C string followed by an integer, and that open takes a C string and two integers. The number of arguments in the signature determines how many of the registers %rdi, %rsi, %rdx, %r10, %r8, and %r9 are relevant. The entries in the map also convey how the data in each of those registers--extracted using ptrace and PTRACE_PEEKUSER--should be interpreted. All arguments are ints, pointers (which should be printed with a leading 0x, unless it's the zero pointer, in which case it should be printed as NULL), or the base address of a '\0'-terminated character array, aka a C string. If the type is unknown, you should print <unknown>. Thus, the long returned by ptrace needs to be truncated to an int, converted to a C++ string, or reinterpreted as a void * before printing it. One twist is that because the C strings reside in the tracee's virtual address space, we need to use ptrace and PTRACE_PEEKDATA to extract the sequence of characters in chunks until we pull in a chunk that includes a '\0'. We have provided a complete implementation in the starter code of a readString function that you can use as-is to read in a C string from the tracee's virtual address space.
Note that some system call signature information isn't easily extracted from the Linux source code, so in some cases, the system call signature map populated may not include a system call (e.g. arch_prctl comes to mind). If the prototype information is missing from the map, print <signature-information-missing> in place of a parameter list, as with arch_prctl(<signature-information-missing>).
Function Return Value
How do you print out return value information according to its type, along with a corresponding error code, if any? If the return value was from...
brkormmap: print that value as a 64-bit pointer, with a leading0x- NOT
brkormmap, and the raw return value is...- nonnegative: print that value as an
int - negative: print as -1, and after that synthesize and print the equivalent errno #define constant (e.g.
EBADForENOENT) and the corresponding error string, which is produced by a call tostrerror. Note that the realerrnoglobal hasn't been set just yet; that comes a few instructions after the system call exits. But you can easily compute whaterrnowill soon be (and what should be printed by trace) by taking the absolute value of the raw return value and using that.
- nonnegative: print that value as an
For negative values, how do you convert the computed errno values to the synthesized values? We provide a function compileSystemCallErrorStrings, documented in trace-error-constants.h, that populates a map of data containing errno-to-name information (again, the errno value is the absolute value of the raw return value). A raw return value of -2, for instance, becomes "ENOENT", and a raw return value of -9 (which we see a bunch in the simple trace of simple-test5) becomes "EBADF". (Optionally, if you're interested, you can look inside the implementation within trace-error-constants.cc). Then, you can use strerror, which also takes an errno value, to get the corresponding error message; e.g. strerror(2) and strerror(9) return "No such file or directory" and "Bad file descriptor", respectively. Check out the man page for strerror for more information.
If a system call was made, but before returning the tracee finishes running, then you can stop looping and print <no return>.
With these pieces in place, your trace executable should have a fully-functional "full" mode!
ptrace Constants Reference
The full list of ptrace constants we use for our own solution are presented right here:
PTRACE_TRACEME: Used by the tracee to state its willingness to be manipulated and inspected by its parent. No additional arguments are required, so a simple call toptrace(PTRACE_TRACEME)does the trick.ptrace(PTRACE_SYSCALL, pid, 0, 0)instructs a stopped tracee to continue executing as usual until it either exits, is signaled, is caught entering a system call, or is caught exiting one. The tracer relies onwaitpidto block until the tracer stops or exits.ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD)instructs the kernel to set bit 7 of the wait status to be 1 for allSIGTRAPs associated with a tracee's system call.PTRACE_PEEKUSER: Used by the tracer to inspect and extract the contents of a tracee's register at the time of the call. Only the first three arguments are needed, and any value passed throughdatais ignored. A call toptrace(PTRACE_PEEKUSER, pid, RSI * sizeof(long)), for instance, returns the contents of the tracee's%rsiregister at the time of the call (provided the suppliedpidis the process id of the tracee, of course). There are constants for all registers (RAX,RSI,RBX,RSP, etc), and the third argument is supposed to be scaled by the size of a word on that processor (which is, by definition, the size of along).PTRACE_PEEKDATA: Used by the tracer to inspect and extract the word of data residing at the specified location within the tracee's virtual address space. A call toptrace(PTRACE_PEEKDATA, pid, 0x7fa59a8b0000)would return the eight bytes residing at address0x7fa59a8b0000within the tracee's virtual address space, and a call toptrace(PTRACE_PEEKDATA, pid, ptrace(PTRACE_PEEKUSER, pid, RDI * sizeof(long))would return the eight bytes residing at another address, which itself resides in%rdi). If you know the contents of a register is an address interpretable as the base address of a'\0'-terminated C string, you can collect all of the characters of that string by a sequence ofPTRACE_PEEKDATAcalls, as implied by the partial implementation ofreadStringwe shared above.
Final Notes
compileSystemCallDatauses yoursubprocessimplementation to construct the required maps, so you'll need to make sure you get that working before you can expect the support libraries to work.- The very first time you run trace, you should expect it to take a while to read in all of the prototype information for the linux kernel source tree. All of the prototype information is cached in a local file after that (the cache file will sit in your repo with the name
.trace_signatures.txt), so trace will fire up much more quickly the second time. Should you want to rebuild the prototype cache for any reason (e.g. you learn yoursubprocessimplementation is borked and allowed a bogus.trace_signatures.txtfile to be generated), you can invoke trace with the--rebuildflag, as with./trace --rebuild ./simple-test5. - This particular program certainly relies on signals, but there are no exposed signal handler functions needed in your solution. You'll get practice with signal handlers on the final Assignment 3 problem, and even more practice with Assignment 4.
- The return value of
traceis always the return value of the tracee. Running the sample executable onsimple-test3should make it clear what we're expecting. Note that you're responsible for printing the last line ("Program exited normally..."). If an error occurs that causestraceto terminate early, it’s ok to terminate with a different status code. - You may assume that the child tracee process may only fail due to a signal from the parent process, and will not fail for any other reason.
- You don't need to do comprehensive error checking like in
subprocess, but you may to help improve the robustness of your solution or to catch functionality issues as you're working
Scherzo No. 4 Implementing farm in C++
Your final challenge is to harness the power of a computer's multiple cores to manage a collection of executables, each running in parallel to contribute its share to a larger result. For the purposes of this problem, we're going to contrive a scenario where the computation of interest--the prime factorization of arbitrarily large numbers--is complex enough that some factorizations take multiple seconds or even minutes to compute. The factorization algorithm itself isn't the focus here, save for the fact that it's potentially time consuming, and that should we need to compute multiple prime factorizations, we should leverage the computing resources of our trusty myth cluster to multiprocess and generate output more quickly.
Consider the following Python program called factor.py:
self_halting = len(sys.argv) > 1 and sys.argv[1] == '--self-halting'
pid = os.getpid()
while True:
if self_halting: os.kill(pid, signal.SIGSTOP)
try: num = int(input())
except EOFError: break;
start = time.time()
response = factorization(num)
stop = time.time()
print('%s [pid: %d, time: %g seconds]' % (response, pid, stop - start))
You really don't need to know Python to understand how it works, because every line of this particular program has a clear C or C++ analog. The primary things we'll point out are:
- Python's
printoperates just like C'sprintf(and it's even process-safe) inputreads and returns a single line of text from standard input, blocking indefinitely until a line is supplied (chomping the'\n') or until end-of-file is detectedfactorizationis something we wrote; it takes an integer (e.g.12345678) and returns the prime factorization (e.g.12345678 = 2 * 3 * 3 * 47 * 14593) as a string. You'll see it when you open upfactor.py.- The
os.killline prompts the script to stop itself (but only if the script is invoked with the '--self-halting' flag) and wait for it to be resumed viaSIGCONT
The following should convince you our script does what you'd expect (this output was from using the bash shell, but with the formatting cleaned up just a bit, so it looks prettier):
$ printf "1234567\n12345678\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28598, time: 0.0561171 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28598, time: 0.512921 seconds]
$ time printf "1234567\n12345678\n123456789\n1234567890\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28601, time: 0.0521989 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28601, time: 0.517912 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28601, time: 5.18094 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 28601, time: 51.763 seconds]
real 0m57.535s
user 0m57.516s
sys 0m0.004s
$ printf "1001\n10001\n" | ./factor.py --self-halting
$ kill -CONT %1
1001 = 7 * 11 * 13 [pid: 28625, time: 0.000285149 seconds]
$ kill -CONT %1
10001 = 73 * 137 [pid: 28625, time: 0.00222802 seconds]
$ kill -CONT %1
$ kill -CONT %1
-bash: kill: (28624) - No such process
$ time printf "123456789\n123456789\n" | ./factor.py
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1199 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1183 seconds]
real 0m10.260s
user 0m10.248s
sys 0m0.008s
$
This last test may look silly, but it certainly verifies that one process is performing the same factorization twice, in sequence, so that the overall running time is roughly twice the time it takes to compute the factorization the first time (no caching here, so the second factorization does it all over again).
Our factorization function runs in O(n) time, so it's very slow for some large inputs. Should you need to compute the prime factorizations of many large numbers, the factor.py script would get the job done, but it may take a while. If, however, you're ssh'ed into a machine that has multiple processors and/or multiple cores (each of the myths has eight cores), you can write a program that manages several processes running factor.py concurrently and tracks which processes are idle and which processes are deep in thoughtful number theory. That’s precisely what we’re going to do.
You're going to write a program--a C++ program called farm--that can run on the myths to leverage the fact that you have eight cores at your fingertips. farm will spawn several workers--one for each core, each running a self-halting instance of factor.py, read an unbounded number of positive integers (one per line, no error checking required), forward each integer on to an idle worker (blocking until one or more workers is ready to read the number), and allow all of the workers to cooperatively publish all prime factorizations to standard output (without worrying about the order in which they're printed). To illustrate how farm should work, check out the following test case:
$ time printf "1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n" | ./farm
There are this many CPUs: 8, numbered 0 through 7.
Worker 4245 is set to run on CPU 0.
Worker 4246 is set to run on CPU 1.
Worker 4247 is set to run on CPU 2.
Worker 4248 is set to run on CPU 3.
Worker 4249 is set to run on CPU 4.
Worker 4250 is set to run on CPU 5.
Worker 4251 is set to run on CPU 6.
Worker 4252 is set to run on CPU 7.
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4249, time: 95.5286 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4252, time: 95.5527 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4245, time: 95.5824 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4247, time: 95.5851 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4248, time: 95.6578 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4250, time: 95.6627 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4251, time: 95.6666 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4246, time: 96.254 seconds]
real 1m36.285s
user 12m42.668s
sys 0m0.128s
$
Note that each of eight processes took about the same amount of time to compute the identical prime factorization, but because each was assigned to a different core, the real (aka perceived) time is basically the time it took to compute the factorization just once. How's that for parallelism!
Note that prime factorizations aren't required to be published in order--that makes this all a little easier--and repeat requests for the same prime factorization are all computed from scratch.
Your farm.cc implementation will make use of the following struct, global constants, and global variables:
struct worker {
worker() {}
// This is defining a constructor for worker using something called an initialization list.
worker(char *argv[]) : sp(subprocess(argv, true, false)), available(false) {}
subprocess_t sp;
bool available;
};
static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN);
static vector<worker> workers(kNumCPUs);
static size_t numWorkersAvailable;
The main function we give you includes stubs for all of the helper functions that decompose it, and that main function looks like this:
int main(int argc, char *argv[]) {
signal(SIGCHLD, markWorkersAsAvailable);
spawnAllWorkers();
broadcastNumbersToWorkers();
waitForAllWorkers();
closeAllWorkers();
return 0;
}
This final problem can be tricky, but it's perfectly manageable provided you follow this road map using the provided functions with the provided signatures:
- First implement
spawnAllWorkers, which spawns a self-haltingfactor.pyprocess for each core and updates the global workers vector so that each worker contains the relevantsubprocess_tallowingfarm.ccto monitor it and pipe prime numbers to it. You can assign a process to always execute on a particular core by leveraging functionality outlined in theCPU_SETandsched_setaffinityman pages (i.e. type inman CPU_SETto learn about thecpu_set_ttype, theCPU_ZEROandCPU_SETmacros, and thesched_setaffinityfunction). - Implement the
markWorkersAsAvailablehandler, which gets invoked to handle all pendingSIGCHLDsignals. Callwaitpidto surface thepidof the child that recently self-halted, and mark it as available. - Implement a
getAvailableWorkerhelper function, which you'll use to decompose thebroadcastNumbersToWorkersfunction in the next step. You should never busy wait. Instead, investigatesigsuspend(by typingman sigsuspend) as a way of blocking indefinitely until at least one worker is available. - Flesh out the implementation of
broadcastNumbersToWorkers. We're giving you a tiny hint here--thatbroadcastNumbersToWorkerskeeps on looping untilEOFis detected. You can restart a stopped process by sending it aSIGCONTsignal.
static void broadcastNumbersToWorkers() {
while (true) {
string line;
getline(cin, line);
if (cin.fail()) break;
size_t endpos;
/* long long num = */ stoll(line, &endpos);
if (endpos != line.size()) break;
// you shouldn't need all that many lines of additional code
}
}
- Implement
waitForAllWorkers, which does more or less what it says--it waits for all workers to self-halt and become available. - Last but not least, implement the
closeAllWorkersroutine to uninstall theSIGCHLDhandler and restore the default (investigate theSIG_DFLconstant), coax all workers to exit by having them recognize that the file they are reading from has ended, and then wait for them to all exit.
Nothing you write should orphan any memory.
Submitting and Grading
Once you are finished working and have saved all your changes, submit by running the submit tool in the tools folder: ./tools/submit. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.
We will grade your assignment based on a combination of test results and code quality. Your assignments will be rigorously tested using the tests we expose via sanitycheck plus a whole set of others. We reserve the right to add tests and change point values if during grading we find some features aren't being exercised or properly rewarded by the below distribution, but we're fairly certain the breakdown presented below will be a very good approximation regardless.
Clean Build (2 points)
Pipeline Tests (20 points)
- Ensure that
pipelineimplementation works with exposed pipeline tests and test strategies discussed in the handout: 6 points - Ensure that
pipelineimplementation works with private grading tests not exposed in the handout or by sanitycheck: 14 points
Subprocess Tests (20 points)
- Ensure that
subprocessimplementation works with exposed subprocess test and test strategies: 10 points - Ensure that
subprocessimplementation works with private grading tests not exposed in the handout or by sanitycheck: 10 points
Trace Tests (40 points)
- Ensure that
traceworks in simple mode on exposedsimple-test1: 10 points - Ensure that details associated with simple and full
tracework as expected: 10 points - Ensure that
traceworks on much larger tracee programs not exposed by sanity or by the handout: 20 points
Farm Tests (28 points)
- Ensures that
farmproperly factors all numbers and distributes numbers across workers in order to maximize parallelism: 25 points - Ensure that farm works with one edge case scenario that's technically valid input: 3 points
Icons by Piotr Kwiatkowski