Section 3: Multiprocessing

Sections Wed Feb 01 to Sat Feb 04

This section handout contains problems by Jerry Cain and Nick Troccoli. Compiled by Bharat Khandelwal, with modifications by Nick Troccoli.

Learning Goals

During this section, you will:

get practice with programs that spawn child processes
become more familiar with how to use pipes
get hands-on practice understanding the behavior of file descriptor tables and the open file table <!-- to redirect output
build on the subprocess example from lecture to implement output redirection -->

Get Started

Clone the section starter code by using the command below. This command creates a section3 directory containing the project files.

git clone /afs/ir/class/cs111/repos/lab3/shared section3

Next, pull up the online section checkoff and have it open in a browser so you can jot things down as you go.

1. Timeout

Write a program called timeout that launches a specified command and allows it to run for up to n seconds before terminating it. It can be run like the following:

./timeout <n> <command>

For instance, we could run

./timeout 5 sleep 3

which should let sleep run to completion and terminate after 3 seconds. However, if we run

./timeout 5 sleep 10

Then timeout should finish after 5 seconds and terminate the child process.

If the command finishes, then timeout should return the exit status of the command. If the command does not finish, then timeout should return an exit status of 124.

One creative way to implement this is to spawn two child processes. One will run the specified command, and the other will sleep for n seconds. We wait for the first one to finish, check which one finished first, terminate the other one, and return an appropriate exit status.

To terminate another process, we can use the builtin kill function, like this:

kill(pid, SIGKILL)

This will terminate the process with the given PID.

To sleep, we can use the builtin sleep function and specify the number of seconds to sleep (atoi can help parse a string to a number):

sleep(n)

Q1: How could we implement this program? Write your implementation in timeout.cc.

2. Pipes and File Descriptors

In lecture, we saw how each process has its own file descriptor table, and file descriptors are really indexes into this table. Moreover, the data about each open resource is not stored in the file descriptor table itself, but rather in a single "open file table" shared by all processes, and file descriptor table entries are pointers to entries in this open file table.

This structure is important because it explains the behavior we see on fork when file descriptors are duplicated from the parent to the child process. We may think that "duplicated" means the child gets its own copies that won't interfere with those in the parent, but what happens is the child gets a copy of the parent's file descriptor table; and because this is pointing to the same open file table entries as the parent, it refers to the same underlying resources. This is why, if the parent opens a file and then calls fork, the child reading from that file will impact the parent's reading through that file as well. It's also why a pipe can be shared between a parent and child to communicate.

Let's see the file descriptor table and open file table in action, and get more practice with file descriptors and pipes in parent and child processes. To do this, we'll use a sample program open-file-table.cc in the starter project.

Q2: Read through the code to understand the behavior of this program. When you're ready, compile the program with make and run it. What is the output for this program? Will it be the same every time you run it?

Q3: Imagine that we paused both the parent and child at the line immediately following fork. What would the current reference count be for the open file? Why is that?

Now imagine that the child process resumes and runs to completion. Then, the parent process resumes as well.

Q4: After what line does the open file table entry for our open file go away? Why?

Now let's imagine we made some modifications to this program. Let's say we created an (unused and not-properly-closed) pipe in the parent process before it calls fork by adding the following lines right before the fork call:

int fds[2];
pipe(fds);

Q5: If we paused execution of both the parent and child on the line immediately after the fork, what would the reference count be of the open file table entry for the read end of the pipe? The write end? How many times would we need to close file descriptors to properly close this pipe in the parent and child?

Q6: Now imagine that we moved these two lines to be immediately after (not before!) the fork call. If we paused both the parent and child right after they both created the pipe, what file descriptors and open file table entries would be present as a result of these lines? Why is this different than when this code was executed prior to the fork call?

3. (If time) Copy-on-write

The fork system call creates a new process with an independent address space, where the contents of the parent's address space are replicated—in a sense, memcpy'ed—into the address space of the clone. If, however, a copy-on-write implementation strategy is adopted, then both parent and child share the same address space and only start to piecemeal split into two independent address spaces as one process makes changes that shouldn't be reflected in the other. In general, most operating systems adopt a copy-on-write approach, even though it's more difficult to implement.

Q7: Given how we’ve seen fork used in class so far (commonly paired with execvp), why does the copy-on-write approach make more sense?