Slide 1: Lecture 04: Files, Memory, and Processes

Principles of Computer Systems

Winter 2020

Stanford University

Computer Science Department

Lecturer: Chris Gregg and

                      Nick Troccoli

PDF of this presentation

Slide 2: Recap of Lectures 1-3

Image:

This diagram shows a vertical list of connections from userspace processes down to harder. Specifically: userspace processes<--system calls-->operating system<--vnode interface-->file system (ext4fs, s6fs, btfs)<--block interface-->device driver (SATA, iSCSI)<--hardare interface-->hardware (PCI-E, DMA, RDMA, etc.)

End of Image

Slide 3: Today's Lecture in 3 Parts

Image:

same image as previous

End of Image

Slide 4: File Descriptor Table and File Descriptors

Image:

This diagram shows the "Process Control Blocks", handled by the OS. Specifically, it shows the "descriptor table for process ID 1000", "descriptor table for process ID 1001", and "descriptor table for process ID 1002". Each sub-diagram has an array from 0 to 9.

End of Image

Slide 5: Creating and Using File Descriptors

Image:

(same as previous)

End of Image

Slide 6: File Descriptor vs. File Table Entries

Image:

Almost the same as previous, with the addition of an expansion of one of the elements in process 1001. The expansion has an entry that reads: "mode: r, cursor: 0, refcount: 1, vnode: (blank)"

End of Image

Slide 7: File Descriptors vs. File Table Entries Example

$ ./main 1> log.txt 2> log.txt
$ ./main 1> log.txt 2>&1

Opens log.txt twice (two file table entries)

Opens log.txt once, two descriptors for same file table entry

// file: testfd.c
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char **argv)
{
    const char* error = "One plus one is\ntwo.\n";
    const char* msg   = "One plus two is\n";

    write(2, error, strlen(error));
    write(1, msg, strlen(msg));
    return 0;
}

Slide 8: File Descriptors vs. File Table Entries Example

1

2

pos: 0

pos: 0

log.txt

fd table

file table

vnode

1

2

pos: 0

log.txt

fd table

file table

vnode

cgregg@myth60:$ ./testfd 1> log.txt 2> log.txt
cgregg@myth60:$ cat log.txt
One plus two is
two.
cgregg@myth60:$
cgregg@myth60:$ ./testfd 1> log.txt 2>&1
cgregg@myth60:$ cat log.txt
One plus one is
two.
One plus two is
cgregg@myth60:$

Slide 9: File Table Details

Image:

Almost the same as the previous diagram, except that there are many descriptor table entries. The idea is that file descriptior 0 from each descriptor table points to the same open file table entry, corresponding to STDIN. Likewise for descriptor 1 (all point to the same open file table entry, for STDOUT, which has a reference count of 3, because there are three files that have STDOUT open), and for descriptor 2 (STDERR). Each descriptor table also has some entries (above 2) that point to open files that are not shared between other processes.

End of Image

Slide 10: vnodes

Image:

Almost the same as the last image, except that one of the vnode entries is further expanded to show the details in the "vnode table". In particular: "type: regfile, refcount: 1, fnptrs ***, inode: 0644; 8:23pm; cgregg"

End of Image

Slide 11: File Decriptors -> File Table -> vnode Table

  • There is one system-wide vnode table for the same reason there is one system-wide open file table. Independent file sessions reading from the same file don't need independent copies of the vnode. They can all alias the same one.
  • Image:

    Almost the same as the last diagram, except now many of the open file table vnode entries are expanded for the vnode table.

    End of Image

    Slide 12: UNIX File Abstractions Summary

    Image:

    Same diagram as earlier: This diagram shows a vertical list of connections from userspace processes down to harder. Specifically: userspace processes<--system calls-->operating system<--vnode interface-->file system (ext4fs, s6fs, btfs)<--block interface-->device driver (SATA, iSCSI)<--hardare interface-->hardware (PCI-E, DMA, RDMA, etc.)

    End of Image

    Slide 13: Questions about files in UNIX?

    Slide 14: Memory mapped files

    Slide 15: Process Address Spaces

    Image:

    This diagram shows main memory layout as a large, vertical rectangle, going from address 0x0 to address 0xFFFFFFFFFFFFFFF

    End of Image

    Slide 16: Memory Regions in a Process

    Image:

    This image is another vertical memory diagram, with the segment details, as follows, from top to bottom (starting in the middle): "user stack, shared libraries, heap, bss, data, rodata, code"

    End of Image

    Slide 17: Memory Mapped Files

    0xFFFFFFFFFFFFFFFF

    0x0

    libc.so

    bash

    heap

    stack

    libdl.so

    data

    Slide 18: Memory Mapped Files and the Buffer Cache

    libc.so

    libc.so

    Process A

    Process B

    Buffer Cache

    Slide 19: Memory Mapped Files Summary

    libc.so

    libc.so

    Process A

    Process B

    Buffer Cache

    Slide 20: Questions about mmap()?

    Slide 21: Introduction to Multiprocessing

    In the CS curriculum so far, your programs have operated in a single process, meaning, basically, that one program was running your code. The operating system gives your program the illusion that it was the only thing running, and that was that.

     

    Now, we are going to move into the realm of multiprocessing, where you control more than one process at a time with your programs. You will ask the OS, “do these things concurrently”, and it will.

    Image:

    An original IBM PC, circa 1981. It has a single processor and can only run one process at a time.

    End of Image

    Slide 22: Process

    // file: getpidEx.c
    #include<stdio.h>
    #include<stdlib.h>
    #include <unistd.h> // getpid
    
    int main(int argc, char **argv)
    {
        pid_t pid = getpid();
        printf("My process id: %d\n",pid);
        return 0;
    }
    cgregg@myth57$ ./getpidEx
    My process id: 7526

    Slide 23: fork()

    Image:

    An image of a fork in the road, showing a path that breaks into two paths going in different directions.

    End of Image

    Slide 24: Fork Return Value

    int main(int argc, char *argv[]) {
      printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
      pid_t pid = fork();
      assert(pid >= 0);
      printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
      return 0;
    }

    Slide 25: Fork in Action!

    myth60$ ./basic-fork 
    Greetings from process 29686! (parent 29351)
    Bye-bye from process 29686! (parent 29351)
    Bye-bye from process 29687! (parent 29686)
    
    myth60$ ./basic-fork 
    Greetings from process 29688! (parent 29351)
    Bye-bye from process 29688! (parent 29351)
    Bye-bye from process 29689! (parent 29688
    int main(int argc, char *argv[]) {
      printf("Greetings from process %d! (parent %d)\n", getpid(), getppid());
      pid_t pid = fork();
      assert(pid >= 0);
      printf("Bye-bye from process %d! (parent %d)\n", getpid(), getppid());
      return 0;
    }

    Slide 26: Debugging Multiprocess Programs

    Slide 27: A Tree of Forks

    static const char const *kTrail = "abcd";
    int main(int argc, char *argv[]) {
      size_t trailLength = strlen(kTrail);
      for (size_t i = 0; i < trailLength; i++) {
        printf("%c\n", kTrail[i]);
        pid_t pid = fork();
        assert(pid >= 0);
      }
      return 0;
    }

    Slide 28: A Tree of Forks, Continued

    myth60$ ./fork-puzzle 
    a
    b
    c
    b
    d
    c
    d
    c
    c
    d
    d
    d
    d
    d
    d
    myth60$
    myth60$ ./fork-puzzle
    a
    b
    b
    c
    d
    c
    d
    c
    d
    d
    c
    d
    myth60$ d 
    d
    d

    Slide 29: Where and When Do We Fork?

    Slide 30: Managing Your Progeny

    Slide 31: Managing Your Progeny

    Slide 32: Example of waitpid()

    int main(int argc, char *argv[]) {
      printf("Before.\n");
      pid_t pid = fork();
      printf("After.\n");
      if (pid == 0) {
        printf("I am the child, and the parent will wait up for me.\n");
        return 110; // contrived exit status
      } else {
        int status;
        waitpid(pid, &status, 0)
        if (WIFEXITED(status)) {
          printf("Child exited with status %d.\n", WEXITSTATUS(status));
        } else {
          printf("Child terminated abnormally.\n");
        }
        return 0;
      }
     }

    Slide 33: Synchronizing Between Parent and Child Using waitpid()

    myth60$ ./separate 
    Before.
    After.
    After.
    I am the child, and the parent will wait up for me.
    Child exited with status 110.
    myth60$

    Slide 34: How Deep is the Copy?

    int main(int argc, char *argv[]) {
      printf("I'm unique and just get printed once.\n");
      bool parent = fork() != 0;
      if ((random() % 2 == 0) == parent) sleep(1); // force exactly one of the two to sleep
      if (parent) waitpid(pid, NULL, 0); // parent shouldn't exit until child has finished
      printf("I get printed twice (this one is being printed from the %s).\n", 
             parent  ? "parent" : "child");
      return 0;
    }

    Slide 35: Multiple Children

    int main(int argc, char *argv[]) {
      for (size_t i = 0; i < 8; i++) {
        if (fork() == 0) exit(110 + i);
      } 
      while (true) {
        int status;
        pid_t pid = waitpid(-1, &status, 0);
        if (pid == -1) { assert(errno == ECHILD); break; }
        if (WIFEXITED(status)) {
          printf("Child %d exited: status %d\n", pid, WEXITSTATUS(status));
        } else {
          printf("Child %d exited abnormally.\n", pid);
        }
      }
      return 0;
    }
    

    Slide 36: Calling waitpid() on Multiple Children

    Slide 37: What's the difference?

    Slide 38: Questions about fork()?

    Slide 39: Lecture Review