Section 3: Multiprocessing

Sections Thu Oct 20 to Fri Oct 21

This section handout is based on problems by Jerry Cain. Compiled by Bharat Khandelwal, with modifications by Nick Troccoli.

Learning Goals

During this section, you will:

  1. get practice with programs that spawn child processes
  2. become more familiar with how to use pipes to redirect output
  3. build on the subprocess example from lecture to implement output redirection

Get Started

Clone the section starter code by using the command below. This command creates a section3 directory containing the project files.

git clone /afs/ir/class/cs111/repos/lab3/shared section3

Next, pull up the online section checkoff and have it open in a browser so you can jot things down as you go.

1. Subprocess, Take 2

Let's implement an upgraded version of the subprocess function from lecture that also allows us to capture the child process's standard output in addition to piping to its standard input. Here's the function signature:

subprocess_t subprocess(char *argv[], bool supplyChildInput, bool ingestChildOutput);

The function takes in the argv array for the child process, and whether we should rewire its STDIN and/or STDOUT. The return value is a struct with the following fields:

struct subprocess_t {
 pid_t pid;
 int supplyfd;  // fd to write to child's STDIN
 int ingestfd;  // fd to read from child's STDOUT
};

The subprocess function spawns a child process to run the given command. If supplyChildInput is true, then subprocess will connect a pipe to the child's STDIN, and supplyfd should contain a file descriptor that the caller can use to write to its STDIN - this is the same functionality as the subprocess function from lecture. If ingestChildOutput is true, then subprocess will connect a pipe to the child's STDOUT, and ingestfd should contain a file descriptor that the caller can use to read from its STDOUT. If both are true, then two pipes are created; one to connect to the child's STDIN, and another to connect to its STDOUT. If either supplyfd or ingestfd aren't used, they should be returned with value kNotInUse. (As a concrete example - the subprocess example from lecture is equivalent to calling this function with supplyChildInput as true, and ingestChildOutput as false).

You may assume all system calls succeed. Be careful to close all file descriptors! (What would go wrong if you don’t close some of them?)

Write your implementation in subprocess.cc. We've provided a subprocess-test.cc program you can use for testing; try running tools/sanitycheck to run the tests provided for this problem.

2. Timeout

Write a program called timeout that launches a specified command and allows it to run for up to n seconds before terminating it. It can be run like the following:

./timeout <n> <command>

For instance, we could run

./timeout 5 sleep 3

which should let sleep run to completion and terminate after 3 seconds. However, if we run

./timeout 5 sleep 10

Then timeout should finish after 5 seconds and terminate the child process.

If the command finishes, then timeout should return the exit status of the command. If the command does not finish, then timeout should return an exit status of 124.

One creative way to implement this is to spawn two child processes. One will run the specified command, and the other will sleep for n seconds. We wait for the first one to finish, check which one finished first, terminate the other one, and return an appropriate exit status.

To terminate another process, we can use the builtin kill function, like this:

kill(pid, SIGKILL)

This will terminate the process with the given PID.

To sleep, we can use the builtin sleep function and specify the number of seconds to sleep (atoi can help parse a string to a number):

sleep(n)

Write your implementation in timeout.cc.

Extra Problems

These problems are additional practice that you may not have time to cover during section. However, they're great additional practice with multiprocessing and pipes!

3. Incorrect Redirection

The publish program takes an arbitrary number of filenames as arguments and attempts to write the date and time (via the date executable that ships with all versions of Unix and Linux) to each of them. publishe is built from the following source:

static void publish(const char *name) {
  printf("Publishing date and time to file named \"%s\".\n", name);
  int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
  dup2(outfile, STDOUT_FILENO);
  close(outfile);
  if (fork() > 0) return;
  char *argv[] = { (char *)"date", NULL };
  execvp(argv[0], argv);
}

int main(int argc, char *argv[]) {
  for (size_t i = 1; i < argc; i++) publish(argv[i]);
  return 0;
}

The intention is for the program to print something like this as it executes:

./publish one two three four
Publishing date and time to file named "one".
Publishing date and time to file named "two".
Publishing date and time to file named "three".
Publishing date and time to file named "four".

However, that's not what happens.

What text is actually printed to standard output?

What do each of the four files contain?

How should the program be rewritten so that it works as intended?

4. Copy-on-write

The fork system call creates a new process with an independent address space, where the contents of the parent's address space are replicated—in a sense, memcpy'ed—into the address space of the clone. If, however, a copy-on-write implementation strategy is adopted, then both parent and child share the same address space and only start to piecemeal split into two independent address spaces as one process makes changes that shouldn't be reflected in the other. In general, most operating systems adopt a copy-on-write approach, even though it's more difficult to implement.

Given how we’ve seen fork used in class so far (commonly paired with execvp), why does the copy-on-write approach make more sense?