Thanks to Mendel Rosenblum for this wonderful assignment. This is a real system, people!
This handout was adapted from Jerry Cain’s Spring 2018 offering.
New Due Date: Thursday, July 4, 2019 at 12:00 pm (noon). NO LATE DAYS ALLOWED.
Related lecture material: Lectures 1 - 3
An archaeological expedition in Murray Hill, NJ has uncovered some magnetic disk drives from the mid-1970s. After considerable effort, the dig’s technical expert read the contents of the drives, and for each disk drive, she created a local file to be an exact bitwise copy of the disk’s data.
The archaeologists determined the data on the disks was stored using the Version 6 Unix file system. Sadly, none of the archaeologists made it through CS110, so they haven’t been able to read the image contents. Your task is to write a program that understands the Unix v6 file system to extract the file system data.
The key systems principle we are trying to drive home this week is that of layering: one way to manage complexity in systems is to break it down into layers that sit on top of each other. There are several layers involved in the Unix v6 filesystem:
/usr/class/cs110/hello.txtto the pathname layer, it will utilize the filename layer beneath it, looping through the components of that full path, until it finds
Any software accessing the filesystem will sit on top of the file and pathname layers. If we’re trying to write an application that does meaningful work, we almost certainly don’t want to be bogged up thinking about the organization of bytes on disk, so we leave that to the aforementioned layers beneath us.
In this assignment, you will implement the latter 4 of these layers. By the end of the assignment, your program will be able to list and read files from a disk formatted with the Unix v6 filesystem.
This assignment is usually given for 1 week, but I am giving you 1.5 weeks so that you have time to get accustomed to working with the development environment and to brush up on your C skills. Get help early if you need it, and don’t get too used to this pacing; future assignments will only be out for 1 week.
Also, expect that it will take some time to figure out the starter code before you can actually start doing anything. We throw a lot of files at you, and it isn’t trivial to figure out what’s going on and what’s expected of you. This may be frustrating, but we want to push you to get better at reading code and finding your way around an existing system; such skills are critical if you want to work on systems comprised of hundreds of thousands or millions of lines of code. Take time to read through all the starter code and to figure out how the files fit together. Give yourself at least an hour before expecting yourself to be productive, and you’ll be less frustrated.
There are several files in the starter code that you will need to jump back and forth between, and future assignments will also involve work with many files. It pays to spend time becoming more proficient with your tools (particularly your terminal emulator and editor) so that you can work efficiently and don’t need to restart vim every time you want to switch to a different file. I’m also working on writing a handout with some misc tips and resources for working with your tools; I will post a link on the course homepage, as well as here, when I’m finished.
All coding should be done on the
myth cluster machines (accessible by running
ssh firstname.lastname@example.org), as that’s where we’ll be testing your
submission. (If you’ve never used
myth before, see essential resources from
CS 107. To get your own copy of the starter code,
you should clone the master git repository we’ve set up for you by typing:
git clone /usr/class/cs110/repos/assign1/$USER assign1
Doing so will create an
assign1 directory within your own space, and you can
descend into that directory to land on the collection of files you’ll be
modifying to arrive at a working product.
You can run the project even before you’ve written any code (though, of course, it won’t produce the correct output).
To build the project,
cd into your local repository and type
To test your work,
cd into your local repository and type
./diskimageaccess <options> <diskimagePath>
<diskimagePath> should be the path to one of the test disks located in
./samples/testdisks/. We currently have three test disks:
diskimageaccess recognizes two flags as valid
-i: test the inode and file layers
-p: test the filename and pathname layers
For example, to run both the inode and filename tests on the basic disk, you could run
./diskimageaccess -ip ./samples/testdisks/basicDiskImage
You can visually compare your output to that generated by my own solution by typing
./samples/diskimageaccess_soln <options> <diskimagePath>
The expected output for running
diskimageaccess on each disk image
stored in the
X.gold file inside the
testdisks directory. We are leveraging
sanitycheck tool, which you may be familiar with from CS 107. In particular, if you type
at the command prompt while in your
assign1 directory, the
script will exercise your implementation in precisely the same way our own
grading scripts will. We won’t always expose all of the functionality tests,
but we are for Assignment 1. (Read: if your sanitycheck passes all tests and
runs clean under
valgrind, you will have a 100% for the functionality
component of this assignment. That won’t necessarily be the case in future
When time comes for you to submit, simply type
Submit early and submit often! You can resubmit as many times as you would like before the deadline. We will only grade your most recent submission.
This section offers an explanation of each file in the starter code. See “Suggested Implementation Order” below for a plan of attack once you have oriented yourself with the codebase.
The files we provide you fall into three categories.
The Version 6 Unix header files (
These define structs that you will be using quite frequently. The structs are explained in the “Header Files and Structures” section below. You shouldn’t need to modify these files.
The test harness (
These files provide the infrastructure for building your code and running our tests against it. You are welcome to modify these files for testing, if you would like, but you shouldn’t need to.
File system module (
The test code we give you interfaces with the file system using a layered API
that we’ve already designed. For each of the layers that you’ll need to
implement for this assignment, there is a header file that defines the
interface to the layer and a corresponding
.c file that should contain the
The layers are:
Block layer (
This defines and implements the interface for reading and writing sectors (note that the words block and sector are used interchangeably) on the disk image. We give you an implementation of this layer that should be sufficient for this assignment, so you don’t need to modify these files.
inode layer (
This defines and implements the interface for reading the file system’s inodes. This includes the ability to look up inodes by inumber and to get the block/sector number of the nth block of the inode’s data. You will be modifying these files.
File layer (
This defines and implements the interface for reading blocks of data from a file by specifying its inumber. This is one of the two layers our tests explicitly exercise. You will be modifying these files.
Filename layer (
This defines and implements the interface for implementing Unix directories on top of files. Its primary function is to get information about a single directory entry. You will be modifying these files.
Pathname layer (
This defines and implements the interface to look up a file by its absolute pathname. This is the second of the two layers we explicitly test. You will be modifying these files.
You are welcome to modify any of the files we give you if you would like. When
making changes in the test harness, do so in a backward compatible way so that
the testing scripts we give you continue to work. In other words, it’s fine to
add testing and debugging code, but make sure that when we run your program
-p, its output has the same format with or without your
changes, so our automated tests won’t break.
Before you try to write any code, you’ll want to familiarize yourself with the details of the Unix V6 file system by reading Section 2.5 of the Saltzer and Kaashoek textbook. You may also find it helpful to read and understand how the test script works.
Section 2.5 of the Salzer and Kaashoek book contains most of what you need to know about the Unix v6 file system in order to do this assignment. You may be able to complete this assignment with just the material I presented in lecture and by referencing my lecture notes, but S&K spells things out in more detail. The information below is supplementary to the textbook, so it assumes you’ve already read and understand the material there.
In the starter code we’ve provided C
structs corresponding to the file
system’s on-disk data structures. These
structs have the same layout in
memory as the structures have on disk. They include:
struct filsys (
Corresponds to the superblock of the file system. This is a slightly modified copy of the header file from Version 6 Unix.
struct inode (
Corresponds to a single inode. Again this comes from Version 6 of Unix, with some small modifications.
struct direntv6 (
Corresponds to a directory entry. This was copied from section 2.5 in the textbook.
unixfilesystem.h contains a description of the file system
layout, including the sector address of the superblock and of the start of the
Back in the 1970s, storage space – both on disk and in main memory – was at a
premium. As a result, the Unix v6 file system goes to lengths to reduce the
size of data it stores. You’ll notice that many integer values in the
structs we provided are stored using only 16 bits, rather than today’s more
standard 32 or 64. (In our code we use the type
get a 16-bit integer, but back in the ‘70s, the C
int type was 16 bits wide.)
In another space-saving move, the designers of the file system stored the
inode’s size field as a 24-bit integer. There’s no 24-bit integer type in C,
so we represent this value using two fields in the
which contains the least-significant 16 bits of the value, and
contains the most-significant 8 bits of the value. We provide a function
inode.c that assembles these two fields into a normal C
integer for you.
Since there is no inode with an inumber of 0, the designers of the file system
decided not to waste the 32 bytes of disk space to store it. The first inode
in the first inode block has inumber 1; this inode corresponds to the root
directory for the file system. (See
unixfilesystem.h for details.)
Be careful not to assume that the first inode has an inumber of 0! Off-by-one errors are the worst.
The 16-bit integer
i_mode in the
inode struct isn’t really a number;
rather, the individual bits of the field indicate various properties of the
#defines which describe what each bit means.
For instance, we say an inode is allocated if it points to an existing file.
The most-significant bit (i.e. bit 15, if we call the least significant bit 0) of
i_mode indicates whether the inode
is allocated or not. So the C expression
(i_mode & IALLOC) == 0 is
the inode is unallocated and
false otherwise. (Recall that in C, any number that starts with a zero represents octal instead of the default decimal value.)
Similarly, bit 12 of
i_mode indicates whether the file uses the large file
mapping scheme. So if
(i_mode & ILARG) != 0 , then the inode’s
fields point at indirect and doubly-indirect blocks rather than directly at the
Bits 14 and 13 form a 2-bit wide field specifying the type of file. This field
is 0 for regular files and 2 (i.e. binary 10, or the constant
directories. So the expression
(i_mode & IFMT) == IFDIR is
true if the
inode is a directory, and
The starter code contains a number of unfinished functions. We suggest you attack them in the following order:
file.c. After this step, your code should pass the inode level functionality tests.
pathname.c. Refer to Section 2.5.6 of the Salzer and Kaashoek book for this part. Afterward, all the pathname tests should pass.
We will grade your assignment based on a combination of test results and code quality.
valgrindreports: 6 points
First, we’ll run your code on a set of test disk images and check that your program gives the expected output. You have access to all of the images we’ll use for grading and the correct output for each of them.
If this were a real archaeological dig, you would have been given only the disk image and a computer. But to make things easier, we’ve provided you with starter code that tries to read all the files on the image and outputs checksums of the inode and file contents. We’ve also computed these checksums using a working implementation of the assignment. If your checksums match ours, then your code is correctly reading the data off the disk.
Finally, we’ll read through your code and give you comments on style and
overall quality. We’ll look for places where you inappropriately break the
layering of the file system or make other mistakes that don’t necessarily cause
your program to generate the wrong output. We’ll also look for common C
programming errors such as memory leaks and use of the heap where stack
allocation is more appropriate, or places when you used
void* pointers when
you could have used stronger types. Finally, we’ll check that your code is easy
to read: It should be well-formatted, organized, and be easy to follow.
If you submitted the Week 1 survey by Sunday, June 30, 11:59pm, we will give you one additional point towards your functionality score. You'll need to use your Stanford email address to access the form.