Lab Solution 1: File Systems and System Calls
This lab was designed by Jerry Cain.
The first and last exercises are problem set-esque questions that could easily appear on a midterm or final exam (or, in your case, a self-assessment). In fact, all of the questions asked in Problem 3 were on previous midterms and finals. The middle problem is an experiment that'll require you fire up your laptop and run some programs and development tools.
Lab Overview
Your weekly lab is a chance to experiment and explore, ask and answer questions, and get hands-on practice in a supported environment. We provide a set of lab exercises that revisit topics from recent lectures/readings and prepare you to succeed at the upcoming assignment.
Lab is collaborative! We're all in this together! We will work together on the exercises. The entire group is one learning community working together to advance the knowledge and mastery of everyone. Stuck on an issue? Ask for help. Have an insight? Please share!
To track lab participation, we have an online checkoff form for you to fill out as you work. Lab is not a race to find answers to exactly and only the checkoff questions-- the checkoff questions are used only to record attendance and get a read on how far you got. Lab credit is awarded based on your sincere participation for the full lab period. Your other rewards for investing in lab are to further practice your skills, work together to resolve open questions, satisfy your curiosity, and reach a place of understanding and mastery. The combination of active exploration, give and take with your peers, and the guidance of the TA makes lab time awesome. We hope you enjoy it!
Get Started
Clone the lab starter code by using the command below. This command creates a lab1
directory containing the project files.
git clone /afs/ir/class/cs110/repos/lab1/shared lab1
Next, pull up the online lab checkoff and have it open in a browser so you can jot things down as you go.
Exercises
1) Direct, Singly Indirect, and Doubly Indirect Block Numbers
Assume blocks are 512 bytes in size, block numbers are four-byte int
s, and that inodes include space for 6 block numbers. The first three are direct block numbers, the next two are singly indirect block numbers, and the final one is a doubly indirect block number.
- What's the maximum file size?
- Maximum file size is (3 * 512) + (2 * 128 * 512) + (128 * 128 * 512), or 8,521,216 bytes. (The first term is the direct blocks, the second term is the singly-indirect blocks, and the third term is the doubly-indirect block)
- How large does a file need to be before the relevant inode requires the first singly indirect block number be used?
- 3 * 512 + 1, or 1,537 bytes. (The +1 is the first byte that cannot fit)
- How large does a file need to be before the relevant inode requires the first doubly indirect block number be used?
- 3 * 512 + 2 * 128 * 512 + 1, or 132,609 bytes. (The +1 is the first byte that cannot fit)
- Draw as detailed an inode as you can if it's to represent a regular file that's 2049 bytes in size.
- The first three block numbers would identify three, saturated payload blocks, and those three payload blocks would store the first 1,536 bytes.
- The fourth block number—a singly, indirect one—would lead to a block of 512 bytes, although only the first eight bytes will be used. The first four bytes would identify the fourth payload block—itself saturated—to store 512 bytes of payload. The next four bytes would identify a fifth payload block where only the first byte is used.
2) valgrind
and orphaned file descriptors
Here's a very short exercise to enhance your understanding of valgrind
and what it can do for you. To get started, type the following in to create a local copy of the repository you'll play with for this problem:
myth$ git clone /usr/class/cs110/repos/lab1/shared lab1
myth$ cd lab1
myth$ make
Now open the file and trace through the code to keep tabs on what file descriptors are created, properly closed, and orphaned.
With this information, try tracing through the program to better understand the file descriptors that are created, closed, and left open. In particular, with the knowledge that open
will use the lowest unused file descriptor, calculate the actual file descriptor numbers. Then run valgrind ./open-fds
to confirm that there aren't any memory leaks or errors (how could there be?), but then run valgrind --track-fds=yes ./open-fds
to get information about the file descriptors that were (intentionally) left open.
Without changing the logic of the program, insert as many close statements as necessary so that all file descriptors (including 0, 1, and 2) are properly donated back. (In general, you do not have to close file descriptors 0, 1, and 2, but for this problem you should.)
- The key for this exercise is that you need a
close
statement for every open descriptor identified by valgrind
, and you should be careful to never close
a previously closed descriptor. Here's an annotated copy of theopen-fds.cc
file.
// fd1 = 3, since 0,1,2 already taken (now 0,1,2,3 taken)
int fd1 = open("open-fds.cc", O_RDONLY);
// STDIN (0) is closed (now 1,2,3 taken)
close(STDIN_FILENO);
// fd2 = 0, since 0 is available (now 0,1,2,3 taken)
int fd2 = open("open-fds.cc", O_RDONLY);
// fd3 = 4, since 0,1,2,3 already taken (now 0,1,2,3,4 taken)
int fd3 = open("Makefile", O_RDONLY);
// 0 is closed (now 1,2,3,4 taken)
close(fd2);
// 3 is closed (now 1,2,4 taken)
close(fd1);
// fd1 = 0, since 0 available (now 0,1,2,4 taken)
fd1 = open("Makefile", O_RDONLY);
// STDOUT (1) is closed (now 0,2,4 taken)
close(STDOUT_FILENO);
// fd4 = 1, since 1 is avaiable (now 0,1,2,4 taken)
int fd4 = open("open-fds.cc", O_RDONLY);
// fd1 (0) is still open
// fd4 (1) is still open
// STDERR_FILENO (2) is still open
// fd3 (4) is still open
return fd3 + fd4;
Here's a modified version of the program that closes all file descriptors correctly:
int fd1 = open("open-fds.cc", O_RDONLY);
close(STDIN_FILENO);
int fd2 = open("open-fds.cc", O_RDONLY);
int fd3 = open("Makefile", O_RDONLY);
close(fd2);
close(fd1);
fd1 = open("Makefile", O_RDONLY);
close(STDOUT_FILENO);
int fd4 = open("open-fds.cc", O_RDONLY);
// Added
close(fd1);
close(fd4);
close(STDERR_FILENO);
close(fd3);
return fd3 + fd4;
3) Short Answer Questions
Provide clear answers and/or illustrations for each of the short answer questions below. Each of these questions is either drawn from old exams or based on old exam questions. Questions like this will certainly appear on assessments.
-
The
dup
system call accepts a valid file descriptor, claims a new, previously unused file descriptor, configures that new descriptor to alias the same file session as the incoming one, and then returns it. Briefly outline what happens to the relevant file entry table and vnode table entries as a result ofdup
being called. (Readman dup
if you'd like, though don't worry about error scenarios).- The vnode table entry is left alone, but we claim a new file descriptor and its entry is set to address the same entry in the open file table as the incoming one, and the reference count within that open file table entry would be incremented by one.
-
Now consider the prototype for the
link
system call (peruseman link
). A successful call tolink
updates the file system so the file identified byoldpath
is also identified bynewpath
. Oncelink
returns, it’s impossible to tell which name was created first. (To be clear,newpath
isn’t just a symbolic link, since it could eventually be the only name for the file.) In the context of the file system discussed in lecture and/or the file system discussed in Section 2.5 of the secondary textbook (S&K), explain howlink
might be implemented.- Resolve
oldpath
to its inode number, append new directory entry to sequence of existing directory entries wherenewpath
resides. Fill in thatnewpath
entry with the last component of newpath
(e.g. the part after its last "/", meaning just the filename itself, not the whole path), and fill in that entry's inumber with the inumber of old path. Finally, increment the reference count within the inode itself to be clear the file has one more name.
- Resolve
-
Explain what happens when you type
cd .././../.
at the shell prompt. Frame your explanation in terms of the file system described in Section 2.5 of the secondary textbook, and the fact that the inode number of the current working directory is the only relevant global variable maintained by your shell.- Using the current working directory's inode number, search the current working directory's payload for
..
, then set the inumber of the current working directory to the inumber associated with..
; repeat for.
, then..
, and then.
- Using the current working directory's inode number, search the current working directory's payload for
-
All modern file systems allow symbolic links to exist as shortcuts for longer absolute and relative paths (e.g.
search_soln
might be a symbolic link for/usr/class/cs110/samples/assign1/search_soln
, andtests.txt
might be a symbolic link for./mytests/tests.txt
). Explain how the absolute pathname resolution process we discussed in lecture would need to change to resolve absolute pathnames to inode numbers when some of the pathname components might be symbolic links.- The absolute pathname resolution process (implemented as
pathname_lookup
in assign2) should piecemeal tokenize pathnames as usual. When a component (e.g.tests.txt
) is identified as a symlink, we must prepend its expansion (meaning the place it symlinks to) to the unprocessed set of tokens. Then, we either continue parsing from the inumber of the symlink's parent for relative symbolic link paths, or restart from inode 1 for absolute paths. Here are two examples to better understand this:- Relative Path: let's say we are trying to resolve a path
/usr/class/cs110/sym/file.txt
, andsym
is a relative symbolic link to../extra/other/
. This means we start parsing the path at the root directory, then go tousr
, thenclass
, thencs110
, and then we get tosym
, which is a symbolic link. Because it is a symbolic link, we prepend it to the unprocessed set of tokens for our path, meaning now we must resolve../extra/other/file.txt
. Because it is relative, we keep parsing this path from where we currently are - incs110
. Thus, we go to..
(up a directory, toclass
), then we go toextra
, thenother
, and then finally tofile.txt
, where we were trying to get to. - Absolute Path: let's say we are trying to resolve a path
/usr/class/cs110/sym/file.txt
, andsym
is an absolute symbolic link to/afs/ir/users
. This means we start parsing the path at the root directory, then go tousr
, thenclass
, thencs110
, and then we get tosym
, which is a symbolic link. Because it is a symbolic link, we prepend it to the unprocessed set of tokens for our path, meaning now we must resolve/afs/ir/users/file.txt
. Because it is absolute, we must resume parsing this path from the root directory. Thus, we go to inode 1, then we go toafs
, thenir
, thenusers
, and then finally tofile.txt
, where we were trying to get to.
- Relative Path: let's say we are trying to resolve a path
- The absolute pathname resolution process (implemented as
Checkoff Questions
-
[Problem 1] If the inode were extended to include a space for 7 block numbers instead of 6, and the 7th block number was a triply indirect block number, what would be maximum file size be now?
- Maximum file size is (3 * 512) + (2 * 128 * 512) + (128 * 128 * 512) + (128 * 128 * 128 * 512), or 1,082,263,040 bytes (roughly 1GB). (The first term is the direct blocks, the second term is the singly-indirect blocks, the third term is the doubly-indirect block, and the fourth term is the triply indirect block)
-
[Problem 2] How many calls to the
close
function did you need to insert?- 4
-
[Problem 3.3] Explain what happens when you type
cd .././../.
at the shell prompt. Frame your explanation in terms of the file system described in Section 2.5 of the secondary textbook, and the fact that the inode number of the current working directory is the only relevant global variable maintained by your shell.- Using the current working directory's inode number, search the current working directory's payload for
..
, then set the inumber of the current working directory to the inumber associated with..
; repeat for.
, then..
, and then.
- Using the current working directory's inode number, search the current working directory's payload for