Due: Fri Jan 23 11:59 pm
Late submissions accepted until Sun Jan 25 11:59 pm
Mendel Rosenblum created this assignment. Improvements (especially to the writeup) have been contributed by Jerry Cain, Chris Gregg, John Ousterhout, and Nick Troccoli.
Unix V6 was an early version of the Unix operating system, which was the predecessor of Linux. Unix V6 was released in 1975 and was the first version of Unix to see widespread usage outside Bell Labs. It was a beautiful piece of software: simple and very small (less than 64 KBytes of code!), yet with many of the most powerful features found in modern Linux systems. For this assignment, we have created images of Unix V6 disks in a format virtually identical to how they would have appeared in 1975. Your task for this assignment is to recreate enough of the V6 file system code to read files from the disk. You will work in C for this assignment, since that was the language used for Unix V6.
Learning Goals
- Understand the different layers of the Unix V6 filesystem and how they work together to access files.
- Learn how to navigate a substantial code base, adding new features and functions, and testing their behavior.
- Appreciate the complexity and trade-offs of implementing a filesystem.
- See how layering can be used to encapsulate knowledge in a modular fashion.
Overview
A key design principle we saw with filesystems is that of layering: one way to manage complexity in systems is to break it down into layers that sit on top of each other. There are several layers involved in the Unix v6 filesystem, from lowest to highest:
- The block layer manages the details of communication with the disk, reading and writing individual sectors. This layer sits underneath almost every filesystem operation, and above this layer, we don’t want to have to think about the nitty-gritty of talking to the hardware.
- The inode layer supplies higher layers with metadata about files. When we need to get a specific inode or know which block number is storing a particular portion of a file, the inode layer tells us that. Above this layer, we don’t want to think about the mechanics of retrieving or updating metadata (which isn’t simple, considering inodes are smaller than sectors).
- The file layer supplies higher layers with actual file contents. We request some number of bytes from a file at a particular offset, and the file layer populates a buffer with that information.
- The directory layer allows us to find a file within a directory. Given a filename and a directory presumably containing that file, this layer figures out what inode stores the information for that file.
- Lastly, the pathname layer implements full absolute path lookups, starting from the root directory. If you hand
/usr/class/cs111/hello.txtto the pathname layer, it will utilize the directory layer beneath it, looping through the components of that full path, until it findshello.txt.
Any user software accessing the filesystem will sit on top of the directory and pathname layers. If we’re trying to write an application that does meaningful work, we almost certainly don’t want to be bogged down thinking about the organization of bytes on disk, so we leave that to the aforementioned layers beneath us.
For this assignment, we give you a fully implemented block layer, and you’ll flesh out the others. By the end of the assignment, your program will be able to list and read files from a disk formatted with the Unix v6 filesystem!
Starter Code / Testing / Debugging
To get started on the assignment, clone the starter project using the command
git clone /usr/class/cs111/repos/assign1/$USER assign1
This line creates a new folder called assign1 in your current working directory containing a copy of the starter code.
The starter project contains the following files listed below. There are a lot of provided files in this assignment, and part of the challenge is learning to navigate them, follow the logic, and add code where appropriate. Expect that it will take some time to figure out the starter code before you can actually start doing anything. This may be frustrating, but we want to push you to get better at reading code and finding your way around an existing system; such skills are critical if you want to work on systems comprised of hundreds of thousands or millions of lines of code. Here are some tips to help with this navigation:
- consult the documentation above and further down in the spec
- consult the comments throughout the provided files, particularly in the
.hfiles for functions you must implement - skim through the test program code (
diskimageaccess.c) to see how your functions are tested - use
grepto search the files for definitions or unknown types; for example, let's say you forget wherestruct inodeis defined and want to see its definition. The format forgrepisgrep <pattern> <file(s) to search>. So if you wanted to search everything in the assignment folder for thestruct inodedefinition, you might dogrep "struct inode {" *to search all files in the current folder (that's the asterisk) for the text "struct inode {". This is also helpful to search for where a function is called. So if you wanted to search for wherefile_getblockis called, you might dogrep "file_getblock(" *.
diskimg.[ch]: files for the block layer, which is fully implemented for youinode.[ch]: files for the inode layer, which you will modifyfile.[ch]: files for the file layer, which you will modifydirectory.[ch]: files for the directory layer, which you will modifypathname.[ch]: files for the pathname layer, which you will modifydiskimageaccess.c,chksumfile.[ch],unixfilesystems.[ch],Makefile: files for the provideddiskimageaccesstest program and the Makefile for compiling.direntv6.h: contains the definition ofstruct direntv6, which represents a single directory entry.filsys.h: contains the definition ofstruct filsys, which represents the filesystem superblock. This is a slightly modified copy of the header file from Version 6 Unix. It also contains several useful constants. You can largely ignore the struct and all of the fields defined inside - it is included for the purposes of the testing framework, but none of the code you write needs to deal with the struct.ino.h: contains the definition ofstruct inode, which represents a single inode. This comes from Version 6 of Unix, with some small modifications.unixfilesystem.[ch]: contains the definition ofstruct unixfilesystem, which represents the filesystem as a whole. It also contains constants specifying the file system layout, including the sector address of the superblock and of the start of the inode region.custom_tests: a file where you will add custom tests that you can run through sanitychecksamples: a symbolic link to the shared directory for this assignment. It contains:SANITY.ini: a file to configure and run Sanity Check. You can ignore this.diskimageaccess_soln: executable solution for the provided test program. Use this program to verify correct behavior for these programs.disk_images: contains "basic", "depthFile", "dirFnameSize" and "basicExtended" disk images for testing, along with corresponding ".structure" files describing the file system structure (what files/directories are in them and what their inode numbers are).tools: contains symbolic links to thesanitycheckandsubmitprograms for testing and submitting your work.
You should only need to modify files for the 4 layers you will implement, along with your custom_tests file.
Testing
The diskimageaccess program provides various ways to test your code incrementally and as a whole. You can run it in GDB for debugging, since it calls your filesystem layer functions. Run ./diskimageaccess -h to view all ways to test your code - there are many testing options, so take a moment to read through what's available!
View diskimageaccess -h output
Usage: ./diskimageaccess <options?> <diskimagePath> <function> <arg1>...<argn>
<options?> is optionally one of:
-h Print this message and exit.
--help Print this message and exit.
--redirect-err Redirect stderr to a file so it won't appear in
program output; a generic message is printed if
the file is non-empty after running the test(s).
Used to check whether an error messsage is printed,
without being sensitive to the exact message text.
<diskimagePath> is the path to a disk image file
(e.g. ones in samples/disk_images).
<function> is one of the assignment functions, e.g.
inode_indexlookup or file_getblock.
<argX> specifies arguments to test that function. You can run
pre-provided tests or specify any arguments
you'd like that are passed directly to the
function to test its output.
Here are arg options for each function you can test:
inode_iget:
- specify "test1" as arg to test inode_iget
on all inodes on the disk
- otherwise, specify the inode number to test
inode_indexlookup:
- specify "test1" as arg to test the first
block of a small file
- specify "test2" as arg to test the first
block of a large file
- specify "test3" as arg to test the last
block of a doubly-indirect large file
- specify "test4" as arg to test
inode_indexlookup on all files on the disk
- otherwise, specify the inode number followed
by the fileBlockIndex to test with
file_getblock:
- specify "test1" as arg to test the first
block of a small file
- specify "test2" as arg to test the first
block of a large file
- specify "test3" as arg to test the last
block of a doubly-indirect large file
- specify "test4" as arg to test
file_getblock on all files on the disk
- otherwise, specify the inode number followed
by the fileBlockIndex to test with
directory_findname:
- specify "test1" as arg to test a dirent
that's not found in a partially-filled block
- specify "test2" as arg to test a dirent
with a length-14 name
- specify "test3" as arg to test
directory_findname on all files on the disk
- otherwise, specify the directory inumber
followed by the entry name to test with
pathname_lookup:
- specify "test1" as arg to test a path
that's not found
- specify "test2" as arg to test
pathname_lookup on all files on the disk
- otherwise, specify the absolute path
to test with
When you run a test, you must specify a "disk image" from the samples/disk_images folder.
diskimageaccess allows you to test each function in two ways:
- using pre-written tests, or
- specifying any arguments you'd like to the function to check that it behaves correctly
For instance, if you want to test inode_indexlookup, you can run one of the pre-written tests like this:
./diskimageaccess samples/disk_images/basicDiskImageExtended inode_indexlookup test1
This will output the results of the pre-written test, which you can compare manually to the sample solution in samples/diskimageaccess_soln, or compare automatically using sanitycheck/custom_tests.
Alternatively, you can test any call to inode_indexlookup you'd like by specifying custom parameters, like this:
./diskimageaccess samples/disk_images/basicDiskImageExtended inode_indexlookup 1 0
This will call inode_indexlookup(1, 0) and print out the result, which you can similarly compare to the sample solution.
To know what inode numbers or values to use to test different disk images, check out the .structure files in the samples/ folder; they show the contents (inumbers, file names) of each disk image.
If you want to test a case that prints an error message, since the error messages do not have to match exactly, include the --redirect-err flag. This flag writes any error messages to a file instead of printing them, and checks if anything was written to the file. This way, the test can pass as long as an error message is printed, even if the message doesn't match exactly.
On this assignment, one testing requirement is that you add at least THREE additional tests to custom_tests using this custom arguments support. In other words, you should add at least three tests that run diskimageaccess with various custom arguments as shown in the example immediately above with inode_indexlookup 1 0. Add a comment above each one (start the line with '#') documenting what that test is verifying.
Debugging
Here are some steps that might help to isolate bugs:
- Gather info about the inode being tested. Run the sample solution on
inode_igetto print out information about a specific inode, like this:samples/diskimageaccess_soln samples/disk_images/SOME_DISK inode_iget SOME_INUMBER. You can see details such as whether it is used/unused, whether it is a directory or file, whether it is small or large, and what itsi_addrvalues are. This information can help better understand why a given test case may be failing. (for instance, if this test is the only one failing, but it is also the only large inode that uses its doubly-indirect block, that could indicate an issue with logic for doubly-indirect blocks specifically). - View the full output outside of sanitycheck and
diffit. if the output is being truncated, or you are having trouble identifying where your output differs, save your and the sample solution's outputs to separate files and usediffto compare them, like this:
YOUR_COMMAND > my_output.txt
SOLN_COMMAND > sample_output.txt
diff my_output.txt sample_output.txt
- Isolate a specific set of arguments for a function that causes the issue, and run in GDB.
diskimageaccesslets you test any function with any arguments. You can see the sample solution output to know what it should report, and then investigate your implementation further under GDB like this:
gdb diskimageaccess
...
(gdb) b inode_indexlookup
(gdb) r samples/disk_images/basicDiskImageExtended inode_indexlookup 1 0
Working On The Assignment
Your job is to complete the implementation of 4 layers of the filesystem: the "inode layer", "file layer", "directory layer", and "pathname layer". Each of these layers has 2 corresponding files, and layers build on each other. You will need to modify the .c files for each layer. We also provide a fully-implemented "block layer" in diskimg.[ch] that provides the code for reading and writing sectors (note that the words block and sector are used interchangeably here) on the disk image.
This spec is long, but written incrementally - you can work as you read through in steps. However, check out our few additional tips at the end as you work.
Note: back in the 1970s, storage space — both on disk and in main memory — was at a premium. As a result, the Unix v6 file system goes to lengths to reduce the size of data it stores. You'll notice that many integer values in the structs we provided are stored using only 16 bits, rather than today's more standard 32 or 64. (In our code we use the type int16_t from stdint.h to get a 16-bit integer, but back in the '70s, the C int type was 16 bits wide.)
As you implement them, consider what functions from lower layers you have already implemented and how they can help! We recommend implementing them in the order specified below.
As you work, we highly recommend testing incrementally and ensuring each function is working before moving on. Sanity check can be helpful at certain points, but make sure to run the diskimageaccess program outside of sanity check to investigate behavior further.
Error Handling
Real systems (and especially operating systems) must deal with a variety
of error conditions, and reporting errors can make it much easier to find and
fix issues. Most of the filesystem functions can return errors.
For example,
diskimg_readsector can return an error (-1 return value) if there is a
problem reading the disk, directory_findname will return an error
if it cannot find the given name in the directory, and inode_indexlookup
should return an error if the specified fileBlockIndex is too large. Each
layer's header file contains details about required error checking; when
in doubt, double check the listed requirements there for what you do and do not
have to check for. Here are the requirements for implementing error checking:
- Returning -1: You must detect appropriate error conditions in all of the code
you write and return -1 when errors occur (e.g., your implementation of
directory_findnamemust return -1 if the given name cannot be found). This includes if a function you call, such as a lower-layer function, returns an error; the calling function should return an error as well. - Error messages: Wherever you return -1, you must print a descriptive error message to
stderrdescribing the problem that occurred, usingfprintf(stderr, ...). If you are returning -1 because a lower-layer function returned -1, you should still print an error message even if the lower-layer function does as well. You can print tostderrlike this:
fprintf(stderr, "inode_indexlookup(fileBlockIndex=%d) error calling diskimg_readsector.\n",
fileBlockIndex);
return -1;
- One exception where you should not print an error message is if
directory_findnamefunctions properly (e.g. no errors in lower layers or other issues) but cannot find the specified name. In that case, just return -1. This is required to pass sanity check test 40. However, if errors do occur indirectory_findname, make sure to print an error message. - Your error messages should be precise: for example, "file_getblock couldn't read disk block 47" is a much more useful message than "oops!" (sadly, messages with about as much useful information as "oops!" occur distressingly often in production code...). Your error messages should include any available information that makes it easier to understand the problem, such as the number of a disk block that couldn't be read, the name of a file that couldn't be found, and so on. You are not required to match the sample solution's error messages, but when in doubt, matching them can be the easiest way to go.
(Provided) Block layer (diskimg.h and diskimg.c)
The lowest layer of the file system provides facilities for reading and
writing individual disk blocks.
In this project, the block layer reads and writes blocks from a disk
image file rather than an actual disk, but the API is the same as it
would be for an actual disk. The starter kit provides you with a complete
implementation of this layer; you will use the function
diskimg_readsector in your code (see the comments in diskimg.h for
more information). You do not need to use diskimg_writesector.
When calling diskimg_readsector, make sure to do error-checking by checking if its return value isn't equal to DISKIMG_SECTOR_SIZE! This can help catch issues, since it returns -1 on error and should otherwise always return DISKIMG_SECTOR_SIZE (512) as the number of bytes read.
Inode layer (inode.h, inode.c)
This layer provides functions for finding inodes on disk and also for using the information in inodes to translate from logical block indexes within a file to physical block/sector numbers on disk. You will implement the functions inode_iget and inode_indexlookup. These functions take in a struct unixfilesystem * parameter that specifies info about the disk being read from. The disk file
descriptor within that struct is needed to call diskimg_readsector. Remember: the first inode in the first inode block has inumber 1! See unixfilesystem.h for details and helpful constants. Make sure to check out the documentation in inode.h for what the functions must do.
inode_iget
This function reads the data for the inode with the specified inumber from the filesystem and fills in an inode struct with that information.
Make sure to implement any required error checking with appropriate error messages!
Testing Recommendation
In addition to sanity check tests 10-12, another way to test is to pick an inode from the .structure file for a disk image and test it vs. the sample solution (try custom_tests!) to see if the inode is read in properly.
inode_indexlookup
This function returns the block number of the file data block at the specified index. What this means is that, if we take just the blocks for this file holding actual file data (e.g. ignoring indirect blocks) and index them sequentially, the fileBlockIndex parameter to this function specifies one of those indexes, and we must return the block/sector number at that index. In other words, the returned block number will always be for a block that stores actual file data. As a concrete example, let's say a file is 1,536 (512 * 3) bytes big. This means it takes up 3 blocks of data. Let's say furthermore that the block numbers it is using are 35, 65 and 21. This means in the inode it stores these three block numbers, and if you fetched blocks 35, 65 and 21 from disk you would have data for the entire file. But for inode_indexlookup, fileBlockIndex would be either 0, 1 or 2 - it's specifying the index into the file's data of the block to get, and it's your job to return the disk block number it corresponds to. For instance, if fileBlockIndex was 0, you would return 35. If fileBlockIndex was 1, you would return 65, etc. For a large file, let's say fileBlockIndex was 256; that would mean you should return the first direct block number contained in the second singly-indirect block (since the 256th block number for the file's data would be in the second singly-indirect block).
This implementation needs to manage direct, singly indirect, and doubly indirect block numbers. Translating for direct block numbers is straightforward (as we did in the example above), but translating for indirect block numbers is hard to get right.
Tips:
- We suggest implementing this function first just for direct blocks; once you do this, you should pass Sanity check test 20 (
depthFileDiskImage) as that is the only disk image that doesn't contain any large files. - Next, we suggest implementing this function only handling singly-indirect blocks; in other words, assume large files will take up at most 7 singly-indirect blocks, and ignore the 8th doubly-indirect one. Once you do this, you should pass Sanity check test 21 (
dirFnameSizeDiskImage) as that disk image doesn't contain any files bigger than7 * 256 * 512bytes. Remember that indirect block numbers identify other blocks that contain 256 two-bytes integers (best managed via theuint16_ttype). - Finally, we suggest implementing the case where you need to examine the 8th doubly-indirect block; once you do this, all
inode_indexlookuptests should pass. - Make sure to share and unify code as much as possible between the singly-indirect and doubly-indirect block cases! Since doubly indirect block numbers lead to singly indirect block numbers, which lead to block numbers.
- use the
i_modefield of theinodestruct to check if the file uses the large or small mapping scheme. Thei_modeis a 16-bit integer whose individual bits indicate various properties of the inode.ino.hcontains#definesthat describe what each bit means. Specifically, bit 12 ofi_modeindicates whether the file uses the large file mapping scheme. So if(i_mode & ILARG) != 0, then the inode'si_addrfields point at indirect and doubly-indirect blocks rather than directly at the data blocks.
Testing Recommendation
In addition to sanity check tests 20-23, there are also extra tests included in custom_tests. Another way to test this function in isolation is to choose an inode number from the .structure file that is either a small or large file depending on what format you'd like to test. Then try fetching each of its file blocks (0, 1, 2,...) via diskimageaccess custom arguments and verify that each of the numbers is correct (try custom_tests!).
File layer (file.h and file.c)
This layer provides a function file_getblock that reads a block of data from a file, given the file's i-number and the desired block within the file. You will write the implementation for this function. Make sure to check out the documentation in file.h for what the function must do.
Make sure to implement any required error checking with appropriate error messages!
file_getblock's job is to read the file data block at the specified index. In other words, whereas inode_indexlookup finds which block number contains the data, this function actually reads in that data. This function returns the number of meaningful bytes in a block/sector that contribute to the payload of some file (be it a regular file or a directory). In other words, it returns the number of bytes in that block that are actually used to store data for the file. As an example, if the file size happens to be a multiple of 512, then this function always returns 512. Alternatively, if a file isn’t a multiple of 512 and we have to get the payload stored in the very last block, you should expect to return a number that’s less than 512 (because only a part of what’s stored in the block is relevant).
Note: in a space-saving move, the designers of the file system stored the inode's size field as a 24-bit integer. There's no 24-bit integer type in C, so we represent this value using two fields in the inode struct: i_size1, which contains the least-significant 16 bits of the value, and i_size0, which contains the most-significant 8 bits of the value. We provide a function inode_getsize in inode.c that assembles these two fields into a normal C integer for you. This may come in handy in file_getblock!
Testing Recommendation
In addition to sanity check tests 30-32, there are also extra tests included in custom_tests. If your checksums aren't matching, identify the inode that is outputting incorrectly and run the sample solution on inode_iget with that inode number to see more information about it. Pay close attention to whether that inode represents a small or large file; that could give you a hint as to where your logic might be going wrong! Then you can use diskimageaccess custom arguments to investigate further.
Directory layer (directory.h and directory.c)
This layer encapsulates knowledge about the structure of directories. You will write the function directory_findname, which reads a directory file to find the directory entry with a given name, if there is one. Make sure to check out the documentation in directory.h for what the function must do.
Make sure to implement any required error checking with appropriate error messages!
This function's job is to get the directory entry (defined as struct direntv6) from the given directory that matches the given name. As a hint, your implementation should make use of strncmp instead of strcmp, because some of the stored directory names aren’t '\0'-terminated. They are, however, guaranteed to be of length 14 or less. Check out the man page for strncmp and strcmp so you understand the difference.
Testing Recommendation
In addition to sanity check tests 40-42, another way to test this function in isolation is to open the .structure file for the test disk and choose a file or folder to test with. Note both its name, as well as the inode number of its parent directory. Then test with diskimageaccess custom arguments, specifying the parent inode number as dirinumber and the chosen name as name.
Pathname layer (pathname.h, pathname.c)
This layer provides a function pathname_lookup which looks up an absolute path (you can assume it is absolute, meaning starts with a "/") and returns the i-number for the file specified by the path. You will implement this function. strsep (a better alternative to the similar but poorly-designed strtok) will come in handy here! We encourage you to implement this function iteratively (though recursion is fine provided it is clean and tidy). Make sure to check out the documentation in pathname.h for what the function must do.
Make sure to implement any required error checking with appropriate error messages!
Testing Recommendation
In addition to sanity check tests 50-53 (51-53 walk the entire file system directory tree, starting at the root directory, and print out all of the paths found), if your inode information isn't matching, try tracing the lookup by hand to make sure that the correct step is taken each time (e.g. / to /foo, /foo to /foo/CVS, etc.). The .structure files show the inumbers for various path depths, so if you are looking up /a/b/c you can check that your lookup for /a, /a/b and /a/b/c are all correct. You can also use diskimageacces to test the pathname and directory layer functions directly with any custom arguments.
Submitting and Grading
Once you are finished working and have saved all your changes, submit by running the submit tool in the tools folder: ./tools/submit. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.
We will grade your assignment with the following criteria:
- Code test results: we will run the sanity check tests as well as potential additional test cases to confirm your functions work as intended and implement the required error checking. You have access to all of the disk images we'll use for grading.
- Error checking: we will confirm that all layers implement necessary error checking, with descriptive error messages
- Custom tests: we will verify that you've added at least 3 custom tests using
diskimageaccesscustom arguments and that each of them has a comment above them describing the test - Clean build and clean
valgrindreports (eg. no memory leaks or errors) - Code style: we will manually review your code and give feedback on coding style and design (see tips below for some style guidelines!)
General Tips
Layering
Try to avoid "breaking the layering" of the file system where possible. You want to aim to have a higher layer build on the layer below it as much as possible, rather than bypassing it and manually calling functions lower down - though in some places calling a function from a lower down layer (like diskimg_readsector) may be appropriate. If this is the case, just make sure that what you're trying to do cannot be accomplished by relying on a higher layer.
Constants
Make sure to use provided constants, and define your own constants, where appropriate. For instance, use INODE_START_SECTOR and ROOT_INUMBER in unixfilesystem.h instead of hardcoding numbers such as 2 and 1 in your code. There are calculations throughout your code, and your goal should be to make them as easy to follow as possible.
Buffers, Types and Casting
When creating a "buffer" (space to store data), carefully consider the type of the buffer. In places like inode_iget and directory_findname, we know what type of data we are reading in, so you should not use void * and casting, and instead declare the type as specifically as you can - i.e. instead of an array of chars, use an array of struct inodes or some other type. Only go with void *, manual pointer arithmetic, and memcpy when you don’t know the data types that are being stored, copied, and passed around. In the case of inode_iget, for instance, you know the sectors you’re examining contain arrays of struct inodes.
Memory Allocation
You should not need to use dynamic memory allocation on this assignment (i.e. malloc/realloc/free).