Assignment 2: All Things Multiprocessing

This handout was adapted from Jerry Cain’s Winter 2019 offering.

We’ve progressed through a good amount of material involving processes and process communication. Rather than building one large program, I’d like you to code up a few different things with the idea that you’ll learn more by tackling multiple problems and leveraging your understanding of the material in multiple domains. All of these programs should be coded directly within a single repository, which you can get by typing the following:

git clone /usr/class/cs110/repos/assign2/$USER assign2

There are three problems in total; the first two exercise your understanding of pipes, and the last one also requires an understanding of signals.

Due Date: Friday, July 12, 2019 at 11:59pm.

Related lecture material: Lectures 4 - 7

Problem 1: Implementing a pipeline in C

Your first task is to implement the pipeline function. This pipeline function accepts two argument vectors, and assuming both vectors are legit, spawns off twin processes with the added bonus that the standard output of the first is directed to the standard input of the second. Here’s the interface you’re coding to:

void pipeline(char *argv1[], char *argv2[], pid_t pids[]);

For simplicity, you can assume that all calls to pipeline are well-formed and will work as expected. In other words, argv1 and argv2 are each valid, NULL-terminated argument vectors, and that pids is the base address of an array of length two. Further assume that all calls to pipe, dup2, close, execvp, and so forth succeed so that you needn’t do any error checking whatsoever. pipeline should return without waiting for either of the child processes to finish (i.e. your pipeline implementation should not call waitpid anywhere), and the ids of the daisy-chained processes are dropped into pids[0] and pids[1]. Also, ensure that the two processes are running in parallel as much as possible, so that pipeline({"sleep", "10", NULL}, {"sleep", "10", NULL}, pids) takes about 10 seconds instead of 20.

You should place your implementation of pipeline in pipeline.c. There is no sanitycheck for pipeline, but you can rely on pipeline-test.c and the pipeline-test executable it compiles to exercise your implementation. The pipeline-test.c test harness I start you off with is small, so you should add more tests of your own to prove that your pipeline is coded to specification. (We will not grade your pipeline-test.c – this is purely for your own testing purposes.)

Note that this first problem is standalone and doesn’t contribute to anything else that follows (although the concept of a pipeline will come back in Assignment 3).

Problem 2: Implementing subprocess in C++

Your next task is to implement an even more flexible subprocess than that implemented in lecture. The most important part of the subprocess.h interface file is right here:

static const int kNotInUse = -1;
				struct subprocess_t {
				pid_t pid;
				int supplyfd;
				int ingestfd;

			 * Function: subprocess
			 * --------------------
			 * Creates a new process running the executable identified via argv[0].
			 *   argv: the NULL-terminated argument vector that should be passed
			 *         to the new process's main function
			 *   supplyChildInput: true if the parent process would like to pipe content
			 *                     to the new process's stdin, false otherwise
			 *   ingestChildOutput: true if the parent would like the child's stdout to be
			 *                      pushed to the parent, false otheriwse
			subprocess_t subprocess(char *argv[],
			bool supplyChildInput, bool ingestChildOutput) throw (SubprocessException);

Read through the subprocess.h documentation to see how this new subprocess should work, and place your implementation in Should any of the system calls needed to implement your subprocess routine fail (either in the parent or in the child), you should throw a SubprocessException around an actionable error message. Inspect the subprocess-exceptions.h file for the full, inlined definition.

Use the test harness supplied by to exercise your implementation, and by all means add to the file to ensure that your implementation is bulletproof. When looking at, you’ll also get a little bit of a reminder how try/catch blocks work. You’ll want to add your own tests to to ensure that all the (true, true), (true, false), (false, true), and (false, false) combinations for (supplyChildInput, ingestChildOutput) all work as expected.

Note that your implementation here is formally C++, since the larger exercise that follows this one is also to be written in C++, and it needs to link against your subprocess implementation without drama. We’re switching to C++ pretty much from this problem forward, because C++ provides better support for strings and generics than C does. C++ also provided native support for some threading and concurrency directives we’ll be relying on a few weeks, and I’d rather ease you into the language now than do so when we branch into the multithreading topic. Truth be told, your C++ implementation of subprocess will look as it would have in pure C, save for the fact that you’re throwing C++ exceptions to identify errors.

Your fully functional subprocess routine is used by the starter code I’ve given you for the final exercise (the one requiring you implement the prime factorization farm). Do not start farm until you are confident in your subprocess implementation.

One trick: you need to investigate pipe's lesser known sibling, pipe2. In particular, you should use pipe2 to create a pipe just as pipe does, while further ensuring that all pipe endpoints are automatically closed when a process calls execvp by passing O_CLOEXEC as the second argument to pipe2. Type man pipe2 at the prompt and search for O_CLOEXEC for more details.

Problem 3: Implementing farm in C++

Your final challenge is to harness the power of a computer’s multiple cores to manage a collection of executables, each running in parallel to contribute its share to a larger result. For the purposes of this problem, we’re going to contrive a scenario where the computation of interest – the prime factorization of arbitrarily large numbers – is complex enough that some factorizations take multiple seconds or even minutes to compute. The factorization algorithm itself isn’t the focus here, save for the fact that it’s potentially time consuming, and that should we need to compute multiple prime factorizations, we should leverage the computing resources of our trusty myth cluster to multiprocess and generate output more quickly.

Consider the following Python program called

self_halting = len(sys.argv) > 1 and sys.argv[1] == '--self-halting'
					pid = os.getpid()
					while True:
					if self_halting: os.kill(pid, signal.SIGSTOP)
					try: num = int(raw_input()) # raw_input blocks, eventually returns a single line from stdin
					except EOFError: break; # raw_input throws an EOFError when EOF is detected
					start = time.time()
					response = factorization(num) 
					stop = time.time()
					print ' %s [pid: %d, time: %g seconds]' % (response, pid, stop - start)

You really don’t need to know Python to understand how it works, because every line of this particular program has a clear C or C++ analog. The primary things I’ll point out are:

The following should convince you our script does what you’d expect:

$ printf "1234567\n12345678\n" | ./
								1234567 = 127 * 9721 [pid: 28598, time: 0.0561171 seconds]
								12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28598, time: 0.512921 seconds]
								$ time printf "1234567\n12345678\n123456789\n1234567890\n" | ./
								1234567 = 127 * 9721 [pid: 28601, time: 0.0521989 seconds]
								12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28601, time: 0.517912 seconds]
								123456789 = 3 * 3 * 3607 * 3803 [pid: 28601, time: 5.18094 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 28601, time: 51.763 seconds]
								real    0m57.535s
								user    0m57.516s
								sys    0m0.004s
								$ printf "1001\n10001\n" | ./ --self-halting
								$ kill -CONT %1
								1001 = 7 * 11 * 13 [pid: 28625, time: 0.000285149 seconds]
								$ kill -CONT %1
								10001 = 73 * 137 [pid: 28625, time: 0.00222802 seconds]
								$ kill -CONT %1
								$ kill -CONT %1
								-bash: kill: (28624) - No such process
								$ time printf "123456789\n123456789\n" | ./
								123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1199 seconds]
								123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1183 seconds]
								real    0m10.260s
								user    0m10.248s
								sys    0m0.008s

This last test may look silly, but it certainly verifies that one process is performing the same factorization twice, in sequence, so that the overall running time is roughly twice the time it takes to compute the factorization the first time (no caching here, so the second factorization does it all over again).

My factorization function runs in O(n) time, so it’s very slow for some large inputs. Should you need to compute the prime factorizations of many large numbers, the script would get the job done, but it may take a while. If, however, you’re ssh‘ed into a machine that has multiple processors and/or multiple cores (each of the myths has eight cores), you can write a program that manages several processes running and tracks which processes are idle and which processes are deep in thoughtful number theory.

You’re going to write a program – a C++ program called farm – that can run on the myths to leverage the fact that you have eight cores at your fingertips. farm will spawn several workers – one for each core, each running a self-halting instance of, read an unbounded number of positive integers (one per line, no error checking required), forward each integer on to an idle worker (blocking until one or more workers is ready to read the number), and allow all of the workers to cooperatively publish all prime factorizations to standard output (without worrying about the order in which they’re printed). To illustrate how farm should work, check out the following test case:

$ time printf "1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n" | ./farm
								There are this many CPUs: 8, numbered 0 through 7.
								Worker 4245 is set to run on CPU 0.
								Worker 4246 is set to run on CPU 1.
								Worker 4247 is set to run on CPU 2.
								Worker 4248 is set to run on CPU 3.
								Worker 4249 is set to run on CPU 4.
								Worker 4250 is set to run on CPU 5.
								Worker 4251 is set to run on CPU 6.
								Worker 4252 is set to run on CPU 7.
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4249, time: 95.5286 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4252, time: 95.5527 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4245, time: 95.5824 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4247, time: 95.5851 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4248, time: 95.6578 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4250, time: 95.6627 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4251, time: 95.6666 seconds]
								1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4246, time: 96.254 seconds]
								real    1m36.285s
								user    12m42.668s
								sys    0m0.128s

Note that each of eight processes took about the same amount of time to compute the identical prime factorization, but because each was assigned to a different core, the real (aka perceived) time is basically the time it took to compute the factorization just once. How’s that for parallelism!

Note that prime factorizations aren’t required to be published in order – that makes this all a little easier – and repeat requests for the same prime factorization are all computed from scratch.

Your implementation will make use of the following C++ record, global constants, and global variables:

struct worker {
								worker() {}
								worker(char *argv[]) : sp(subprocess(argv, true, false)), available(false) {}
								subprocess_t sp;
								bool available;

							static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN);
							static vector<worker> workers(kNumCPUs);
							static size_t numWorkersAvailable;

The main function we give you includes stubs for all of the helper functions that decompose it, and that main function looks like this:

int main(int argc, char *argv[]) {
									signal(SIGCHLD, markWorkersAsAvailable);
									return 0;

This final problem can be tricky, but it’s perfectly manageable provided you follow this road map:

static void broadcastNumbersToWorkers() {
										while (true) {
										string line;
										getline(cin, line);
										if ( break;
										size_t endpos;
										/* long long num = */ stoll(line, &endpos);
										if (endpos != line.size()) break;
										// you shouldn't need all that many lines of additional code

Your implementation should not make any invalid memory accesses or cause any segfaults, and nothing you write should orphan any memory.

Submitting your work

Once you’re done, you should run ./tools/sanitycheck all of your work as you normally would and then run ./tools/submit.

Extra credit

If you submitted the Week 2 survey by Sunday, July 7, 11:59pm, we will give you one additional point towards your functionality score. You'll need to use your Stanford email address to access the form.