File Systems
Lecture Notes for CS 140
Winter 2012
John Ousterhout
- Readings for this topic from Operating System Concepts:
Sections 10.1-10.2, Sections 11.1-11.2, Sections 11.4-11.6,
Section 12.4.
- Problems addressed by modern file systems:
- Disk Management:
- Fast access to files (minimize seeks)
- Sharing space between users
- Efficient use of disk space
- Naming: how do users select files?
- Protection: isolation between users, controlled sharing.
- Reliability: information must last safely for long periods
of time.
- File: a named collection of bytes stored on durable storage such as disk.
- File access patterns:
- Sequential: information is processed in order, one byte
after another.
- Random Access: can address any byte in the file directly
without passing through its predecessors. E.g. the data set
for demand paging, also databases.
- Keyed: search for blocks with particular values, e.g.
hash table, associative database, dictionary.
Usually provided by databases, not operating system.
File Descriptors
- How should disk sectors be used to represent the bytes
of a file?
- File descriptor: Data structure that describes a file;
stored on disk along with file data. Info in file descriptor:
- Sectors occupied by file
- File size
- Access times (last read, last write)
- Protection information (owner id, group id, etc.)
- Issues to consider:
- Most files are small (a few kilobytes or less).
- Most of the disk space is in large files.
- Many of the I/O operations are for large files.
Thus, per-file cost must be low but large files must have good
performance.
- Contiguous allocation (also called "extent-based"):
allocate files like segmented memory. Keep
a free list of unused areas of the disk. When creating a file,
make the user specify its length, allocate all the space at once.
Descriptor contains location and size.
- Advantages:
- Easy access, both sequential and random
- Simple
- Few seeks
- Drawbacks:
- Fragmentation will make it hard to use disk space
efficiently; large files may be impossible
- Hard to predict needs at file creation time
Example: IBM OS/360.
- Linked files: keep a linked list of all free blocks.
In file descriptor, just keep pointer to first block. Each
block of file contains pointer to next block.
- Advantages?
- Drawbacks?
Examples (more or less): TOPS-10, Xerox Alto.
- Windows FAT:
- Like linked allocation, except don't keep the links in
the blocks themselves.
- Keep the links for all files in a single table called
the File Allocation Table
- Each FAT entry is disk sector number of next block in file
- Special values for "last block in file", "free block"
- Originally, each FAT entry was 16 bits.
- FAT32 supports larger disks:
- Each entry has 28 bits of sector number
- Disk addresses refer to clusters: groups of adjacent
sectors.
- Cluster sizes 2 - 32 KBytes; fixed for any particular
disk partition.
- Indexed files: keep an array of block pointers
for each file.
- Maximum length must be declared for file when
it is created.
- Allocate array to hold pointers to all the blocks, but
don't allocate the blocks.
- Fill in the pointers dynamically as file is written.
- Advantages?
- Drawbacks?
- Multi-level indexes (4.3 BSD Unix):
- File descriptor = 14 block pointers, initially 0 ("no block").
- First 12 point to data blocks.
- Next entry points to an indirect block (contains 1024
4-byte block pointers).
- Last entry points to a doubly-indirect block.
- Maximum file length is fixed, but large.
- Indirect blocks aren't allocated until needed.
Buffer Cache
- Use part of main memory to retain recently-accessed disk
blocks.
- Blocks that are referenced frequently (e.g., indirect
blocks for large files) are usually in the cache.
- This solves the problem of slow access to large files.
- Originally, buffer caches were fixed size.
- As memories have gotten larger, so have buffer caches.
- Many systems now unify the buffer cache and the VM
page pool: any page can be used for either, based on
LRU access.
- What happens when a block in the cache is modified?
- Synchronous writes: immediately write through
to disk.
- Safe: data won't be lost if the machine crashes
- Slow: process can't continue until disk I/O completes
- May be unnecessary:
- Many small writes to the same block
- File deleted soon (e.g., temporary files)
- Delayed writes: don't immediately write to
disk:
- Wait a while (30 seconds?) in case there are more writes
to a block or the block is deleted
- Fast: writes return immediately
- Dangerous: may lose data after a system crash
Free Space Management
- Managing disk free space: many early systems just used a
linked list of free blocks.
- At the beginning, free list is sorted, so blocks in a
file are allocated contiguously.
- Free list quickly becomes scrambled, so files are spread
all over disk.
- 4.3 BSD approach to free space: bit map:
- Keep an array of bits, one per block.
- 1 means block is free, 0 means block in use
- During allocation, search bit map for a block that's
close to the previous block of the file.
- If disk isn't full, this usually works pretty well.
- If disk is nearly full this becomes very expensive
and doesn't produce much locality.
- Solution: don't let the disk fill up!
- Pretend disk has 10% less capacity then it really has
- If disk is 90% full, tell user it's full
and don't allow any more data to be written.
Block Sizes
- Many early file systems (e.g. Unix) used a block size of 512 bytes
(one sector).
- Inefficient I/O: more distinct transfers, hence more seeks.
- Bulkier file descriptors: only 128 pointers in an indirect
block.
- Increase block size (e.g. 2KB clusters in FAT32)?
- 4.3BSD solution: multiple block sizes
- Large blocks are 4 KBytes; most blocks are large
- Fragments are multiples of 512 bytes, fitting
within a single large block
- The last block in a file can be a fragment.
- Bit map for free blocks is based on fragments.
- One large block can hold fragments from multiple files.
Disk Scheduling
- If there are several disk I/O's waiting to be executed,
what is the best order in which to execute them?
- Goal is to minimize seek time.
- First come first served (FCFS, FIFO): simple,
but does nothing to optimize seeks.
- Shortest seek time first (SSTF):
- Choose next request that is as close as possible to
the previous one.
- Good for minimizing seeks, but can result in
starvation for some requests.
- Scan ("elevator algorithm").
- Same as SSTF except heads keep moving in one direction
across disk.
- Once the edge of the disk has been reached, seek to
the farthest block away and start again.