Assignment 1: Reading Unix V6 Filesystems

NOTE: this website is out of date. This is the course web site from a past quarter, Fall 2023. If you are a current student taking the course, you should visit the current class web site instead. If the current website is not yet visible by going to cs111.stanford.edu, it may be accessible by visiting this link until the new page is mounted at this address. Please be advised that courses' policies change with each new quarter and instructor, and any information on this out-of-date page may not apply to you.

Due: Thu Oct 12 11:59 pm
Late submissions accepted until Sat Oct 14 11:59 pm

Mendel Rosenblum created this assignment. Improvements (especially to the writeup) have been contributed by Jerry Cain, Chris Gregg, John Ousterhout, and Nick Troccoli.

Unix V6 was an early version of the Unix operating system, which was the predecessor of Linux. Unix V6 was released in 1975 and was the first version of Unix to see widespread usage outside Bell Labs. It was a beautiful piece of software: simple and very small (less than 64 KBytes of code!), yet with many of the most powerful features found in modern Linux systems. For this assignment, we have created images of Unix V6 disks in a format virtually identical to how they would have appeared in 1975. Your task for this assignment is to recreate enough of the V6 file system code to read files from the disk. You will work in C for this assignment, since that was the language used for Unix V6.

Learning Goals

Understand the different layers of the Unix V6 filesystem and how they work together to access files.
Learn how to navigate a substantial code base, adding new features and functions, and testing their behavior.
Appreciate the complexity and trade-offs of implementing a filesystem.
See how layering can be used to encapsulate knowledge in a modular fashion.

[NEW FOR ASSIGN1] Assignment Support: Starting with assign1, we are not able to look at code when providing help in Helper Hours. This doesn't mean we can't help at the code level with questions - but we aren't able to look over assignment code when answering questions. We are happy to look over example code such as code from lecture or section, talk through concepts or bugs, debugging strategies, and more!

Overview

A key design principle we saw with filesystems is that of layering: one way to manage complexity in systems is to break it down into layers that sit on top of each other. There are several layers involved in the Unix v6 filesystem, from lowest to highest:

The block layer manages the details of communication with the disk, reading and writing individual sectors. This layer sits underneath almost every filesystem operation, and above this layer, we don’t want to have to think about the nitty-gritty of talking to the hardware.
The inode layer supplies higher layers with metadata about files. When we need to get a specific inode or know which block number is storing a particular portion of a file, the inode layer tells us that. Above this layer, we don’t want to think about the mechanics of retrieving or updating metadata (which isn’t simple, considering inodes are smaller than sectors).
The file layer supplies higher layers with actual file contents. We request some number of bytes from a file at a particular offset, and the file layer populates a buffer with that information.
The directory layer allows us to find a file within a directory. Given a filename and a directory presumably containing that file, this layer figures out what inode stores the information for that file.
Lastly, the pathname layer implements full absolute path lookups, starting from the root directory. If you hand /usr/class/cs111/hello.txt to the pathname layer, it will utilize the directory layer beneath it, looping through the components of that full path, until it finds hello.txt.

Any user software accessing the filesystem will sit on top of the directory and pathname layers. If we’re trying to write an application that does meaningful work, we almost certainly don’t want to be bogged down thinking about the organization of bytes on disk, so we leave that to the aforementioned layers beneath us.

For this assignment, we give you a fully implemented block layer, and you’ll flesh out the others. By the end of the assignment, your program will be able to list and read files from a disk formatted with the Unix v6 filesystem!

Starter Code

To get started on the assignment, clone the starter project using the command

git clone /usr/class/cs111/repos/assign1/$USER assign1

This line creates a new folder called assign1 in your current working directory containing a copy of the starter code.

The starter project contains the following files listed below. There are a lot of provided files in this assignment, and part of the challenge is learning to navigate them, follow the logic, and add code where appropriate. Expect that it will take some time to figure out the starter code before you can actually start doing anything. This may be frustrating, but we want to push you to get better at reading code and finding your way around an existing system; such skills are critical if you want to work on systems comprised of hundreds of thousands or millions of lines of code. Here are some tips to help with this navigation:

consult the documentation above and further down in the spec
consult the comments throughout the provided files, particularly in the .h files for functions you must implement
skim through the test program code (diskimageaccess.c) to see how your functions are tested
use grep to search the files for definitions or unknown types; for example, let's say you forget where struct inode is defined and want to see its definition. The format for grep is grep <pattern> <file(s) to search>. So if you wanted to search everything in the assignment folder for the struct inode definition, you might do grep "struct inode {" * to search all files in the current folder (that's the asterisk) for the text "struct inode {". This is also helpful to search for where a function is called. So if you wanted to search for where file_getblock is called, you might do grep "file_getblock(" *.

View Full Starter File List

diskimg.[ch]: files for the block layer, which is fully implemented for you
inode.[ch]: files for the inode layer, which you will modify
file.[ch]: files for the file layer, which you will modify
directory.[ch]: files for the directory layer, which you will modify
pathname.[ch]: files for the pathname layer, which you will modify
diskimageaccess.c, chksumfile.[ch], unixfilesystems.[ch], Makefile: files for the provided diskimageaccess test program and the Makefile for compiling.
direntv6.h: contains the definition of struct direntv6, which represents a single directory entry.
filsys.h: contains the definition of struct filsys, which represents the filesystem superblock. This is a slightly modified copy of the header file from Version 6 Unix. It also contains several useful constants. You can largely ignore the struct and all of the fields defined inside - it is included for the purposes of the testing framework, but none of the code you write needs to deal with the struct.
ino.h: contains the definition of struct inode, which represents a single inode. This comes from Version 6 of Unix, with some small modifications.
unixfilesystem.[ch]: contains the definition of struct unixfilesystem, which represents the filesystem as a whole. It also contains constants specifying the file system layout, including the sector address of the superblock and of the start of the inode region.
custom_tests: a file where you will add custom tests that you can run through sanitycheck
samples: a symbolic link to the shared directory for this assignment. It contains:

SANITY.ini: a file to configure and run Sanity Check. You can ignore this.
diskimageaccess_soln: executable solution for the provided test program. Use this program to verify correct behavior for these programs.
disk_images: contains "basic", "depthFile", "dirFnameSize" and "basicExtended" disk images for testing, along with corresponding ".structure" files describing the file system structure (what files/directories are in them and what their inode numbers are).

tools: contains symbolic links to the sanitycheck and submit programs for testing and submitting your work.

You should only need to modify files for the 4 layers you will implement, along with your custom_tests file.

Testing

The diskimageaccess program provides various ways to test your code incrementally and as a whole. You can run it in GDB for debugging, since it calls your filesystem layer functions. ./diskimageaccess -h shows the various testing options - there are many provided mechanisms to test your code, so take a moment to read through what's available:

Usage: ./diskimageaccess <diskimagePath> <function> <arg1>...<argn>
<diskimagePath> is the path to a disk image file.
<function> is one of the assignment functions, e.g. inode_indexlookup or file_getblock.
<argX> specifies arguments to test that function.  You can run pre-provided tests or
specify any arguments you'd like that are passed directly to the function to test its output.
  inode_iget:
    - specify "test1" as arg to test inode_iget on all inodes on the disk
    - otherwise, specify the inode number to test with
  inode_indexlookup:
    - specify "test1" as arg to test the first block of a small file
    - specify "test2" as arg to test the first block of a large file
    - specify "test3" as arg to test the last block of a doubly-indirect large file
    - specify "test4" as arg to test inode_indexlookup on all files on the disk
    - otherwise, specify the inode number followed by the fileBlockIndex to test with
  file_getblock:
    - specify "test1" as arg to test the first block of a small file
    - specify "test2" as arg to test the first block of a large file
    - specify "test3" as arg to test the last block of a doubly-indirect large file
    - specify "test4" as arg to test file_getblock on all files on the disk
    - otherwise, specify the inode number followed by the fileBlockIndex to test with
  directory_findname:
    - specify "test1" as arg to test a dirent that's not found in a partially-filled block
    - specify "test2" as arg to test a dirent with a length-14 name
    - specify "test3" as arg to test directory_findname on all files on the disk
    - otherwise, specify the directory inumber followed by the entry name to test with
  pathname_lookup:
    - specify "test1" as arg to test a path that's not found
    - specify "test2" as arg to test pathname_lookup on all files on the disk
    - otherwise, specify the absolute path to test with

diskimagePath should be the path to one of the test disks located in samples/disk_images. We currently have four test disks: basicDiskImage, depthFileDiskImage, dirFnameSizeDiskImage and basicDiskImageExtended.

diskimageaccess allows you to test each function in two ways:

using pre-written tests, or
specifying any arguments you'd like to the function to check that it behaves correctly

For instance, if you want to test inode_indexlookup, you can run one of the pre-written tests like this:

./diskimageaccess samples/disk_images/basicDiskImageExtended inode_indexlookup test1

This will output the results of the pre-written test, which you can compare manually to the sample solution in samples/diskimageaccess_soln, or compare automatically using sanitycheck/custom_tests.

Alternatively, you can test any call to inode_indexlookup you'd like by specifying custom parameters, like this:

./diskimageaccess samples/disk_images/basicDiskImageExtended inode_indexlookup 1 0

This will call inode_indexlookup(1, 0) and print out the result, which you can similarly compare to the sample solution. This custom arguments support is particularly useful for debugging; if you isolate a specific set of arguments that expose a bug, you can run it under GDB like this:

gdb diskimageaccess
...
(gdb) b inode_indexlookup
(gdb) r samples/disk_images/basicDiskImageExtended inode_indexlookup 1 0

To know what inode numbers or values to use to test different disk images, check out the .structure files in the samples/ folder; they show the contents (inumbers, file names) of each disk image.

On this assignment, one testing requirement is that you add at least THREE additional tests to custom_tests using this custom arguments support. In other words, you should add at least three tests that run diskimageaccess with various custom arguments as shown in the example immediately above with inode_indexlookup 1 0. Add a comment above each one (start the line with '#') documenting what that test is verifying.

Error Handling

Real systems (and especially operating systems) must deal with a variety of error conditions, and reporting errors can make it much easier to find and fix issues. Most of the filesystem functions can return errors. For example, diskimg_readsector can return an error (-1 return value) if there is a problem reading the disk, directory_findname will return an error if it cannot find the given name in the directory, and inode_indexlookup should return an error if the specified fileBlockIndex is too large. You must check for errors in two ways: First, you must detect appropriate error conditions in all of the code you write and return -1 when errors occur (e.g., your implementation of directory_findname must return -1 if the given name cannot be found). Second, you must check for errors returned by any of the functions you invoke, such as diskimg_readsector. When a child function returns an error, the parent function must usually return an error as well.

The header files for each layer specify more details about the required error checking for each function. In addition, before returning an error from any function, you must print a descriptive message to stderr describing the problem that occurred, using fprintf(stderr, ...) E.g.

fprintf(stderr, "inode_indexlookup(fileBlockIndex=%d) error calling diskimg_readsector.\n",
  fileBlockIndex);
return -1;

Be precise: for example, "file_getblock couldn't read disk block 47" is a much more useful message than "oops!" (sadly, messages with about as much useful information as "oops!" occur distressingly often in production code...). Your error messages should include any available information that makes it easier to understand the problem, such as the number of a disk block that couldn't be read, the name of a file that couldn't be found, and so on.

Important note: to match the behavior of the sample solution, don't print an error message in directory_findname if an entry cannot be found; just return -1. Additionally, while your error messages do not necessarily need to duplicate those generated by our sample solution (with the exception of sanity check test 50, which does require exact matching), they must contain appropriate information, and matching the message from the sample solution may be the easiest way to go. If you write any custom tests that trigger error messages, the error messages must match exactly for the test to pass.

Working On The Assignment

Your job is to complete the implementation of 4 layers of the filesystem: the "inode layer", "file layer", "directory layer", and "pathname layer". Each of these layers has 2 corresponding files, and layers build on each other. You will need to modify the .c files for each layer. We also provide a fully-implemented "block layer" in diskimg.[ch] that provides the code for reading and writing sectors (note that the words block and sector are used interchangeably here) on the disk image.

This spec is long, but written incrementally - you can work as you read through in steps. However, check out our few additional tips at the end as you work.

Note: back in the 1970s, storage space — both on disk and in main memory — was at a premium. As a result, the Unix v6 file system goes to lengths to reduce the size of data it stores. You'll notice that many integer values in the structs we provided are stored using only 16 bits, rather than today's more standard 32 or 64. (In our code we use the type int16_t from stdint.h to get a 16-bit integer, but back in the '70s, the C int type was 16 bits wide.)

As you implement them, consider what functions from lower layers you have already implemented and how they can help! We recommend implementing them in the order specified below.

As you work, we highly recommend testing incrementally and ensuring each function is working before moving on. Sanity check can be helpful at certain points, but make sure to run the diskimageaccess program outside of sanity check to investigate behavior further.

(Provided) Block layer (`diskimg.h` and `diskimg.c`)

The lowest layer of the file system provides facilities for reading and writing individual disk blocks. In this project, the block layer reads and writes blocks from a disk image file rather than an actual disk, but the API is the same as it would be for an actual disk. The starter kit provides you with a complete implementation of this layer; you will use the function diskimg_readsector in your code (see the comments in diskimg.h for more information). You do not need to use diskimg_writesector.

When calling diskimg_readsector, make sure to do error-checking by checking if its return value isn't equal to DISKIMG_SECTOR_SIZE! This can help catch issues, since it returns -1 on error and should otherwise always return DISKIMG_SECTOR_SIZE (512) as the number of bytes read.

Inode layer (`inode.h`, `inode.c`)

This layer provides functions for finding inodes on disk and also for using the information in inodes to translate from logical block indexes within a file to physical block/sector numbers on disk. You will implement the functions inode_iget and inode_indexlookup. These functions take in a struct unixfilesystem * parameter that specifies info about the disk being read from. The "handle" within that struct is needed to call diskimg_readsector. Remember: the first inode in the first inode block has inumber 1! See unixfilesystem.h for details and helpful constants. Make sure to check out the documentation in inode.h for what the functions must do.

`inode_iget`

This function reads the data for the inode with the specified inumber from the filesystem and fills in an inode struct with that information.

Testing Recommendation

In addition to sanity check tests 10-12, another way to test is to pick an inode from the .structure file for a disk image and test it vs. the sample solution (try custom_tests!) to see if the inode is read in properly.

`inode_indexlookup`

This function returns the block number of the file data block at the specified index. What this means is that, if we take just the blocks for this file holding actual file data (e.g. ignoring indirect blocks) and index them sequentially, the fileBlockIndex parameter to this function specifies one of those indexes, and we must return the block/sector number at that index. In other words, the returned block number will always be for a block that stores actual file data. As a concrete example, let's say a file is 1,536 (512 * 3) bytes big. This means it takes up 3 blocks of data. Let's say furthermore that the block numbers it is using are 35, 65 and 21. This means in the inode it stores these three block numbers, and if you fetched blocks 35, 65 and 21 from disk you would have data for the entire file. But for inode_indexlookup, fileBlockIndex would be either 0, 1 or 2 - it's specifying the index into the file's data of the block to get, and it's your job to return the disk block number it corresponds to. For instance, if fileBlockIndex was 0, you would return 35. If fileBlockIndex was 1, you would return 65, etc. For a large file, let's say fileBlockIndex was 256; that would mean you should return the first direct block number contained in the second singly-indirect block (since the 256th block number for the file's data would be in the second singly-indirect block).

This implementation needs to manage direct, singly indirect, and doubly indirect block numbers. Translating for direct block numbers is straightforward (as we did in the example above), but translating for indirect block numbers is hard to get right.

Tips:

We suggest implementing this function first just for direct blocks; once you do this, you should pass Sanity check test 20 (depthFileDiskImage) as that is the only disk image that doesn't contain any large files.
Next, we suggest implementing this function only handling singly-indirect blocks; in other words, assume large files will take up at most 7 singly-indirect blocks, and ignore the 8th doubly-indirect one. Once you do this, you should pass Sanity check test 21 (dirFnameSizeDiskImage) as that disk image doesn't contain any files bigger than 7 * 256 * 512 bytes. Remember that indirect block numbers identify other blocks that contain 256 two-bytes integers (best managed via the uint16_t type).
Finally, we suggest implementing the case where you need to examine the 8th doubly-indirect block; once you do this, all inode_indexlookup tests should pass.
Make sure to share and unify code as much as possible between the singly-indirect and doubly-indirect block cases! Since doubly indirect block numbers lead to singly indirect block numbers, which lead to block numbers. Make sure whatever code you write to translate singly indirect block numbers also contributes (without cutting and pasting or otherwise copying it) to the process used to ultimately convert doubly indirect block numbers.
use the i_mode field of the inode struct to check if the file uses the large or small mapping scheme. The i_mode is a 16-bit integer whose individual bits indicate various properties of the inode. ino.h contains #defines that describe what each bit means. Specifically, bit 12 of i_mode indicates whether the file uses the large file mapping scheme. So if (i_mode & ILARG) != 0, then the inode's i_addr fields point at indirect and doubly-indirect blocks rather than directly at the data blocks.

Testing Recommendation

In addition to sanity check tests 20-23, there are also extra tests included in custom_tests. Another way to test this function in isolation is to choose an inode number from the .structure file that is either a small or large file depending on what format you'd like to test. Then try fetching each of its file blocks (0, 1, 2,...) via diskimageaccess custom arguments and verify that each of the numbers is correct (try custom_tests!).

File layer (`file.h` and `file.c`)

This layer provides a function file_getblock that reads a block of data from a file, given the file's i-number and the desired block within the file. You will write the implementation for this function. Make sure to check out the documentation in file.h for what the function must do.

file_getblock's job is to read the file data block at the specified index. In other words, whereas inode_indexlookup finds which block number contains the data, this function actually reads in that data. This function returns the number of meaningful bytes in a block/sector that contribute to the payload of some file (be it a regular file or a directory). In other words, it returns the number of bytes in that block that are actually used to store data for the file. As an example, if the file size happens to be a multiple of 512, then this function always returns 512. Alternatively, if a file isn’t a multiple of 512 and we have to get the payload stored in the very last block, you should expect to return a number that’s less than 512 (because only a part of what’s stored in the block is relevant).

Note: in a space-saving move, the designers of the file system stored the inode's size field as a 24-bit integer. There's no 24-bit integer type in C, so we represent this value using two fields in the inode struct: i_size1, which contains the least-significant 16 bits of the value, and i_size0, which contains the most-significant 8 bits of the value. We provide a function inode_getsize in inode.c that assembles these two fields into a normal C integer for you. This may come in handy in file_getblock!

Testing Recommendation

In addition to sanity check tests 30-32, there are also extra tests included in custom_tests. If your checksums aren't matching, identify the inode that is outputting incorrectly and check the .output file for more information about it. Pay close attention to whether that inode represents a small or large file; that could give you a hint as to where your logic might be going wrong! Then you can use diskimageaccess custom arguments to investigate further.

Directory layer (`directory.h` and `directory.c`)

This layer encapsulates knowledge about the structure of directories. You will write the function directory_findname, which reads a directory file to find the directory entry with a given name, if there is one. Make sure to check out the documentation in directory.h for what the function must do.

This function's job is to get the directory entry (defined as struct direntv6) from the given directory that matches the given name. As a hint, your implementation should make use of strncmp instead of strcmp, because some of the stored directory names aren’t '\0'-terminated. They are, however, guaranteed to be of length 14 or less. Check out the man page for strncmp and strcmp so you understand the difference.

Testing Recommendation

In addition to sanity check tests 40-42, another way to test this function in isolation is to open the .structure file for the test disk and choose a file or folder to test with. Note both its name, as well as the inode number of its parent directory. Then test with diskimageaccess custom arguments, specifying the parent inode number as dirinumber and the chosen name as name.

Pathname layer (`pathname.h`, `pathname.c`)

This layer provides a function pathname_lookup which looks up an absolute path (you can assume it is absolute, meaning starts with a "/") and returns the i-number for the file specified by the path. You will implement this function. strsep (a better alternative to the similar but poorly-designed strtok) will come in handy here! We encourage you to implement this function iteratively (though recursion is fine provided it is clean and tidy). Make sure to check out the documentation in pathname.h for what the function must do.

Testing Recommendation

In addition to sanity check tests 50-53 (51-53 walk the entire file system directory tree, starting at the root directory, and print out all of the paths found), if your inode information isn't matching, try tracing the lookup by hand to make sure that the correct step is taken each time (e.g. / to /foo, /foo to /foo/CVS, etc.). The .output output files show the inumbers for various path depths, so if you are looking up /a/b/c you can check that your lookup for /a, /a/b and /a/b/c are all correct. You can also use diskimageacces to test the pathname and directory layer functions directly with any custom arguments.

Submitting and Grading

Once you are finished working and have saved all your changes, submit by running the submit tool in the tools folder: ./tools/submit. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.

We will grade your assignment with the following criteria:

Code test results: we will run the sanity check tests as well as potential additional test cases to confirm your functions work as intended and implement the required error checking. You have access to all of the disk images we'll use for grading.
Custom tests: we will verify that you've added at least 3 custom tests using diskimageaccess custom arguments and that each of them has a comment above them describing the test
Clean build and clean valgrind reports (eg. no memory leaks or errors)
Code style: we will manually review your code and give feedback on coding style and design (see tips below for some style guidelines!)

General Tips

Layering

Try to avoid "breaking the layering" of the file system where possible. You want to aim to have a higher layer build on the layer below it as much as possible, rather than bypassing it and manually calling functions lower down - though in some places calling a function from a lower down layer (like diskimg_readsector) may be appropriate.

Constants

Make sure to use provided constants, and define your own constants, where appropriate. For instance, use INODE_START_SECTOR and ROOT_INUMBER in unixfilesystem.h instead of hardcoding numbers such as 2 and 1 in your code. There are calculations throughout your code, and your goal should be to make them as easy to follow as possible.

Buffers, Types and Casting

When creating a "buffer" (space to store data), carefully consider the type of the buffer. In places like inode_iget and directory_findname, we know what type of data we are reading in, so you should not use void * and casting, and instead declare the type as specifically as you can - i.e. instead of an array of chars, use an array of struct inodes or some other type. Only go with void *, manual pointer arithmetic, and memcpy when you don’t know the data types that are being stored, copied, and passed around. In the case of inode_iget, for instance, you know the sectors you’re examining contain arrays of struct inodes. Similarly, your implementation of directory_findname should declare an array of struct direntv6 instead of a character array of length 512, and you should be able to take advantage of the fact that an indirect block contains an array of 256 two-byte integers (uint16_t).

Memory Allocation

You should not need to use dynamic memory allocation on this assignment (i.e. malloc/realloc/free).

Learning Goals

Overview

Starter Code

Testing

Error Handling

Working On The Assignment

(Provided) Block layer (diskimg.h and diskimg.c)

Inode layer (inode.h, inode.c)

inode_iget

Testing Recommendation

inode_indexlookup

Testing Recommendation

File layer (file.h and file.c)

Testing Recommendation

Directory layer (directory.h and directory.c)

Testing Recommendation

Pathname layer (pathname.h, pathname.c)

Testing Recommendation

Submitting and Grading

General Tips

Layering

Constants

Buffers, Types and Casting

Memory Allocation

(Provided) Block layer (`diskimg.h` and `diskimg.c`)

Inode layer (`inode.h`, `inode.c`)

`inode_iget`

`inode_indexlookup`

File layer (`file.h` and `file.c`)

Directory layer (`directory.h` and `directory.c`)

Pathname layer (`pathname.h`, `pathname.c`)