Section 3: Multiprocessing

Sections Wed Jan 29 to Sun Feb 02

This section handout contains problems by Jerry Cain and Nick Troccoli. Compiled by Bharat Khandelwal, with modifications by Nick Troccoli.

Learning Goals

During this section, you will:

get practice with programs that spawn child processes
build on the parent-child-pipe example from lecture to practice using pipes and dup2
become more familiar with the behavior of pipes and sharing a pipe between processes

Get Started

Clone the section starter code by using the command below. This command creates a section3 directory containing the project files.

git clone /afs/ir/class/cs111/repos/lab3/shared section3

Next, pull up the online section checkoff and have it open in a browser so you can jot things down as you go.

1. [Main Problem] Timeout

Write a program called timeout that launches a specified command and allows it to run for up to n seconds before terminating it. It can be run like the following:

./timeout <n> <command>

For instance, we could run

./timeout 5 sleep 3

which should let sleep run to completion and terminate after 3 seconds. However, if we run

./timeout 5 sleep 10

Then timeout should finish after 5 seconds and terminate the child process.

If the command finishes, then timeout should return the exit status of the command. If the command does not finish, then timeout should return an exit status of 124.

One creative way to implement this is to spawn two child processes. One will run the specified command, and the other will sleep for n seconds. We wait for the first one to finish, check which one finished first, terminate the other one, and return an appropriate exit status.

To terminate another process, we can use the builtin kill function, like this:

kill(pid, SIGKILL)

This will terminate the process with the given PID.

To sleep, we can use the builtin sleep function and specify the number of seconds to sleep (atoi can help parse a string to a number):

sleep(n)

Q1: How could we implement this program? Write your implementation in timeout.cc.

2. [Remaining Time] Pipes and `dup2`

The pipe.cc file includes the same parent-child pipe program shown in lecture, but modifies the parent code to send a message to the child by rewiring its STDOUT instead of explicitly writing to the pipe write end. We'll use this program to further explore pipes and see an example of using dup2. pipe.cc is built from the following source:

static const char * kPipeMessage = "Hello, this message is coming through a pipe.";

int main(int argc, char *argv[]) {
    /* This is a patch fix for an issue where printf doesn't 
     * immediately print the output.  It tells printf to 
     * print immediately, rather than buffering the output.
     */
    setbuf(stdout, NULL);

    int fds[2]; 
    pipe(fds); 
    size_t bytesSent = strlen(kPipeMessage) + 1; 
    pid_t pidOrZero = fork(); 
    // Child only reads from pipe (assume everything is read)
    if (pidOrZero == 0) {
        close(fds[1]);
        char buffer[bytesSent]; 
        read(fds[0], buffer, sizeof(buffer)); 
        close(fds[0]); 
        printf("Message from parent: %s\n", buffer); 
        return 0; 
    } 
    // Parent only writes to pipe, via printf
    close(fds[0]); 
    dup2(fds[1], STDOUT_FILENO);
    close(fds[1]);
    printf("%s", kPipeMessage);
    waitpid(pidOrZero, NULL, 0); 
    return 0; 
}

First, pull up the file in your terminal and examine the program to understand its behavior. Then work through the following questions.

Q2: What is the dup2 call doing in the parent code? How does this still end up sending a message to the child process?

Q3: Try modifying the code to move the pipe creation to directly after the fork call. What happens? Why?

Q4: Why do we have the close(fds[1]) line after the call to dup2? Why are we still able to write a message to the pipe after that line executes?

Q5: Try adding a sleep(2) call to the parent code immediately before the printf, so that the child will be essentially guaranteed to reach the read call before the parent writes anything. Does this cause any issues? Why or why not?

3. [Extra] Copy-on-write

The fork system call creates a new process with an independent address space, where the contents of the parent's address space are replicated—in a sense, memcpy'ed—into the address space of the clone. If, however, a copy-on-write implementation strategy is adopted, then both parent and child share the same address space and only start to piecemeal split into two independent address spaces as one process makes changes that shouldn't be reflected in the other. In general, most operating systems adopt a copy-on-write approach, even though it's more difficult to implement.

Q6: Given how we’ve seen fork used in class so far (commonly paired with execvp), why does the copy-on-write approach make more sense?

4. [Extra - requires Fri. material] Pipes and File Descriptors

On assign3, the pipe system call will allow us to implement pipelines between processes, where the output of one process is fed as the input to the next. In lecture, we saw how the behavior of the file descriptor table and open file table enable pipes to be shared across processes. Here are some key details:

each process has its own file descriptor table, and file descriptors are really indexes into this table.
the data about each open resource is not stored in the file descriptor table itself, but rather in a single "open file table" shared by all processes, and file descriptor table entries are pointers to entries in this open file table.
on fork, the child gets a copy of the parent's file descriptor table; and because this is pointing to the same open file table entries as the parent, it refers to the same underlying resources.
Therefore, if the parent opens a file and then calls fork, the child reading from that file will impact the parent's reading through that file as well. It's also why a pipe can be shared between a parent and child to communicate.

Let's see this in action, and get more practice with file descriptors and pipes in parent and child processes. To do this, we'll use a sample program open-file-table.cc in the starter project.

Q7: Read through the code to understand the behavior of this program. When you're ready, compile the program with make and run it. What is the output for this program? Will it be the same every time you run it?

Q8: Imagine that we paused both the parent and child at the line immediately following fork. What would the current reference count be for the open file? Why is that?

Now imagine that the child process resumes and runs to completion. Then, the parent process resumes as well.

Q9: After what line does the open file table entry for our open file go away? Why?

Now let's imagine we made some modifications to this program. Let's say we created an (unused and not-properly-closed) pipe in the parent process before it calls fork by adding the following lines right before the fork call:

int fds[2];
pipe(fds);

Q10: If we paused execution of both the parent and child on the line immediately after the fork, what would the reference count be of the open file table entry for the read end of the pipe? The write end? How many times would we need to close file descriptors to properly close this pipe in the parent and child?

Q11: Now imagine that we moved these two lines to be immediately after (not before!) the fork call. If we paused both the parent and child right after they both created the pipe, what file descriptors and open file table entries would be present as a result of these lines? Why is this different than when this code was executed prior to the fork call?

Bonus: CPlayground live execution

CPlayground is an online code sandbox that allows you to step through and observe program execution, including what the open file table and file descriptor tables look like at any given time. The site sometimes has the occasional hiccup, but feel free to give it a try:

Open Playground

You can add a breakpoint on any line by clicking in the code margin on the left (for instance, try adding a breakpoint on line 41, the line immediately following fork). Then click "Debug" in the top-left. A debug view should appear in the bottom right showing active processes (2 total) and open files. (note: the "open files" view shows the file descriptor tables at the top, the open file table below it, and below that a third layer called the "vnode table" - you can ignore this vnode table, as it's not something we'll be worrying about). Click "Open Files" to view the current open files. There will be separate inline controls for each process; click the "Play" button to resume that process, or the "step" button (the icon that looks like an arrow between boxes) to step through a process's execution. As you do, the visualization of the program on the right will change.