Section 3: Multiprocessing

NOTE: this website is out of date. This is the course web site from a past quarter. If you are a current student taking the course, you should visit the current class web site instead. If the current website is not yet visible by going to cs111.stanford.edu, it may be accessible by visiting this link until the new page is mounted at this address. Please be advised that courses' policies change with each new quarter and instructor, and any information on this out-of-date page may not apply to you.

Sections Wed Jan 31 to Sat Feb 03

This section handout contains problems by Jerry Cain and Nick Troccoli. Compiled by Bharat Khandelwal, with modifications by Nick Troccoli.

Learning Goals

During this section, you will:

  1. get practice with programs that spawn child processes
  2. become more familiar with how to use pipes
  3. get hands-on practice understanding the behavior of file descriptor tables and the open file table

Get Started

Clone the section starter code by using the command below. This command creates a section3 directory containing the project files.

git clone /afs/ir/class/cs111/repos/lab3/shared section3

Next, pull up the online section checkoff and have it open in a browser so you can jot things down as you go.

1. Timeout

Write a program called timeout that launches a specified command and allows it to run for up to n seconds before terminating it. It can be run like the following:

./timeout <n> <command>

For instance, we could run

./timeout 5 sleep 3

which should let sleep run to completion and terminate after 3 seconds. However, if we run

./timeout 5 sleep 10

Then timeout should finish after 5 seconds and terminate the child process.

If the command finishes, then timeout should return the exit status of the command. If the command does not finish, then timeout should return an exit status of 124.

One creative way to implement this is to spawn two child processes. One will run the specified command, and the other will sleep for n seconds. We wait for the first one to finish, check which one finished first, terminate the other one, and return an appropriate exit status.

To terminate another process, we can use the builtin kill function, like this:

kill(pid, SIGKILL)

This will terminate the process with the given PID.

To sleep, we can use the builtin sleep function and specify the number of seconds to sleep (atoi can help parse a string to a number):

sleep(n)

Q1: How could we implement this program? Write your implementation in timeout.cc.

2. (Time-dependent) Pipes and File Descriptors

On assign3, the pipe system call will allow us to implement pipelines between processes, where the output of one process is fed as the input to the next. In lecture, we saw how the behavior of the file descriptor table and open file table enable pipes to be shared across processes. Here are some key details:

  • each process has its own file descriptor table, and file descriptors are really indexes into this table.
  • the data about each open resource is not stored in the file descriptor table itself, but rather in a single "open file table" shared by all processes, and file descriptor table entries are pointers to entries in this open file table.
  • on fork, the child gets a copy of the parent's file descriptor table; and because this is pointing to the same open file table entries as the parent, it refers to the same underlying resources.
  • Therefore, if the parent opens a file and then calls fork, the child reading from that file will impact the parent's reading through that file as well. It's also why a pipe can be shared between a parent and child to communicate.

Let's see this in action, and get more practice with file descriptors and pipes in parent and child processes. To do this, we'll use a sample program open-file-table.cc in the starter project.

Q2: Read through the code to understand the behavior of this program. When you're ready, compile the program with make and run it. What is the output for this program? Will it be the same every time you run it?

Q3: Imagine that we paused both the parent and child at the line immediately following fork. What would the current reference count be for the open file? Why is that?

Now imagine that the child process resumes and runs to completion. Then, the parent process resumes as well.

Q4: After what line does the open file table entry for our open file go away? Why?

Now let's imagine we made some modifications to this program. Let's say we created an (unused and not-properly-closed) pipe in the parent process before it calls fork by adding the following lines right before the fork call:

int fds[2];
pipe(fds);

Q5: If we paused execution of both the parent and child on the line immediately after the fork, what would the reference count be of the open file table entry for the read end of the pipe? The write end? How many times would we need to close file descriptors to properly close this pipe in the parent and child?

Q6: Now imagine that we moved these two lines to be immediately after (not before!) the fork call. If we paused both the parent and child right after they both created the pipe, what file descriptors and open file table entries would be present as a result of these lines? Why is this different than when this code was executed prior to the fork call?

Bonus: CPlayground live execution

CPlayground is an online code sandbox that allows you to step through and observe program execution, including what the open file table and file descriptor tables look like at any given time. The site sometimes has the occasional hiccup, but feel free to give it a try:

Open Playground

You can add a breakpoint on any line by clicking in the code margin on the left (for instance, try adding a breakpoint on line 41, the line immediately following fork). Then click "Debug" in the top-left. A debug view should appear in the bottom right showing active processes (2 total) and open files. (note: the "open files" view shows the file descriptor tables at the top, the open file table below it, and below that a third layer called the "vnode table" - you can ignore this vnode table, as it's not something we'll be worrying about). Click "Open Files" to view the current open files. There will be separate inline controls for each process; click the "Play" button to resume that process, or the "step" button (the icon that looks like an arrow between boxes) to step through a process's execution. As you do, the visualization of the program on the right will change.

3. [If Time] Copy-on-write

The fork system call creates a new process with an independent address space, where the contents of the parent's address space are replicated—in a sense, memcpy'ed—into the address space of the clone. If, however, a copy-on-write implementation strategy is adopted, then both parent and child share the same address space and only start to piecemeal split into two independent address spaces as one process makes changes that shouldn't be reflected in the other. In general, most operating systems adopt a copy-on-write approach, even though it's more difficult to implement.

Q7: Given how we’ve seen fork used in class so far (commonly paired with execvp), why does the copy-on-write approach make more sense?