Section 1: File Systems

Sections Wed Jan 18 to Sat Jan 21

Based on a section handout by Jerry Cain and compiled by Parthiv Krishna, with modifications by Nick Troccoli.

Section Overview

Your weekly section is a chance to experiment and explore, ask and answer questions, and get hands-on practice in a supported environment. We provide a set of section exercises that revisit topics from recent lectures and prepare you to succeed at the upcoming assignment.

Section is collaborative! We're all in this together! We will work together on the exercises. The entire group is one learning community working together to advance the knowledge and mastery of everyone. Stuck on an issue? Ask for help. Have an insight? Please share!

To track section participation, we have an online checkoff form for you to fill out as you work. Section is not a race to find answers to exactly and only the checkoff questions-- the checkoff questions are used only to record attendance and get a read on how far you got. Section credit is awarded based on your sincere participation for the full section period. Your other rewards for investing in section are to further practice your skills, work together to resolve open questions, satisfy your curiosity, and reach a place of understanding and mastery. The combination of active exploration, give and take with your peers, and the guidance of the TA makes section time awesome. We hope you enjoy it!

Get Started

Pull up the online section checkoff and have it open in a browser so you can jot things down as you go.

Exercises

1) Unix v6 Filesystem Overview

The Unix v6 filesystem is the case study design you'll be implementing portions of on assign1. Take a few minutes to discuss / talk through each of the following terms / aspects with your group, partner or whole section - ask questions about anything you need review with!

inode
inode table
direct block numbers
singly-indirect blocks
doubly-indirect blocks
"small" vs. "large" file scheme
directory
directory entry

2) Accessing Inodes

On assign1, you'll implement the inode_iget function, which has the following signature:

int inode_iget(const struct unixfilesystem *fs, int inumber, 
        struct inode *inp);

Its job is to fetch the data for an inode with the given inumber and store it in *inp. Let's do some practice with the process for locating an inode on disk given its inumber; recall that inodes are stored in the inode table, which starts at sector 2, inode numbers start at 1, and inodes are 32 bytes big (meaning 16 of them can fit in each block). For each of inode numbers 256 and 345, answer the following questions:

Which sector should we read in order to get that inode?
Within that sector, which index should we access in order to get that inode?

3) Accessing Payload Data

In the Unix v6 Filesystem, both files and directories have payload data, which could be stored via direct block numbers or indirect blocks. A file's payload data is its contents (e.g. text). A directory's payload data is its directory entries. (Note that a file's size stored in its inode is the size of its payload data in bytes, which doesn't include the size of the indirect blocks needed to track the block numbers).

On assign1, you'll implement the inode_indexlookup function, which has the following signature:

int inode_indexlookup(const struct unixfilesystem *fs, 
    struct inode *inp, int fileBlockIndex);

Its job is, given a pointer to an inode inp, to return the block number where a given chunk of that file (or directory)'s payload data can be found. Specifically, fileBlockIndex is the index of a payload block within the file: 0 means the first payload block, 1 means the second payload block, and so on, regardless of whether those payload blocks are stored directly in the inode or in indirect blocks. Let's say we're working with a small file - if fileBlockIndex is 0, we should return what's stored in i_addr[0]. If fileBlockIndex is 4, we should return what's stored in i_addr[4]. Now let's say we're working with a large file - if fileBlockIndex is 0, we should return the first block number stored in the block referred to by i_addr[0], since i_addr[0] is a singly-indirect block number. If fileBlockIndex is 256, we should return the first block number stored in the block referred to by i_addr[1]. Let's do some more practice (recall that block numbers are 2 bytes big, so 256 of them fit in a single indirect block).

Imagine we have a large file; for fileBlockIndex = 2 and fileBlockIndex = 542, which block number in i_addr should we fetch from disk in order to get the one containing the block number we're looking for?

4) (If time) Alternative Designs

The Unix v6 filesystem is just one possible design of a filesystem - there could be many variations or alternatives considered. Imagine a variation where block numbers are four-byte ints instead of two bytes, and where inodes include space for 6 block numbers, instead of 8. Moreover, imagine that inodes don't have a "small" or "large" scheme, but instead just have one scheme, regardless of size. Specifically, let's say the first three are direct block numbers, the next two are singly indirect block numbers, and the final one is a doubly indirect block number.

What's the maximum file size?
How large does a file need to be before the relevant inode requires the first singly indirect block number be used?
How large does a file need to be before the relevant inode requires the first doubly indirect block number be used?
Draw as detailed an inode as you can if it's to represent a regular file that's 2049 bytes in size.

Unlike the Unix V6 filesystem we learned about in class, the ext2/3/4 family of filesystems (commonly used on Linux) use variable-length directory entries:

struct ext3_dir_entry { 
    uint32_t inode_number; 
    uint16_t name_length; 
    uint16_t file_type; 
    char[] file_name; 
};

What is the benefit to designing directory entries this way? What are some drawbacks? (Consider what happens when deleting files.)