Optional readings for this topic from Operating Systems: Principles and Practice: Chapter 11, Section 13.3 (up through page 567).
Problems addressed by modern file systems:
- Disk space management:
- Fast access to files (minimize seeks)
- Sharing space between users
- Efficient use of disk space
- Naming: how do users select files?
- Reliability: information must survive OS crashes and hardware failures.
- Protection: isolation between users, controlled sharing.
File: a named collection of bytes stored on durable storage such as disk.
File access patterns:
- Sequential: information is processed in order, one byte after another.
- Random Access: can address any byte in the file directly without passing through its predecessors. E.g. the data set for demand paging, also databases.
- Keyed (or indexed): search for blocks with particular contents, e.g. hash table, associative database, dictionary. Usually provided by databases, not operating system.
Issues to consider:
- Most files are small (a few kilobytes or less), so per-file overheads must be low.
- Most of the disk space is in large files.
- Many of the I/O operations are for large files, so performance must be good for large files.
- Files may grow unpredictably over time.
Operating system data structure with information about a particular file
- Stored on disk along with file data.
- Kept in memory when file is open.
Info in inode:
- File size
- Sectors occupied by file
- Access times (last read, last write)
- Protection information (owner id, group id, etc.)
How should disk sectors be used to represent the bytes of a file?
Contiguous allocation (also called "extent-based"):
- Allocate files like segmented memory (contiguous groups of sectors called extents).
- Inode contains number of first sector, file length in sectors.
- User must specify length when creating a file.
- Keep a free list of unused areas of the disk.
- Easy access, both sequential and random
- Few seeks for I/O
- Fragmentation will make it hard to use disk space efficiently; large files may be impossible
- Must predict needs at file creation time
- Can't extend files Example: IBM OS/360.
- Divide disk into fixed-sized blocks (4096 bytes?)
- Keep a linked list of all free blocks.
- In inode, just keep pointer to first block.
- Each block of file contains pointer to next block.
- Drawbacks? Examples (more or less): TOPS-10, Xerox Alto.
- Like linked allocation, except don't keep the links in the blocks themselves.
- Keep the links for all files in a single table called
the File Allocation Table
- Table is memory resident during normal operation
- Each FAT entry is disk sector number of next block in file
- Special values for "last block in file", "free block"
- Directory entry stores number of first block in file
- Originally, each FAT entry was 16 bits.
- FAT32 supports larger disks:
- Each entry has 28 bits of sector number
- Disk addresses refer to clusters: groups of adjacent sectors.
- Cluster sizes 2 - 32 KBytes; fixed for any particular disk partition.
- Still used to day for flash sticks, digital cameras, many embedded devices
Multi-level indexes (4.3 BSD Unix):
- Files divided into blocks of 4 Kbytes.
- Blocks of each file managed with multi-level arrays of block pointers.
- Inode = 14 block pointers, initially 0 ("no block").
- First 12 point to data blocks (direct blocks).
- Next entry points to an indirect block (contains 1024 4-byte block pointers).
- Last entry points to a doubly-indirect block.
- Maximum file length is fixed, but large.
- Indirect blocks aren't allocated until needed.
Use part of main memory to retain recently-accessed disk blocks.
Blocks that are referenced frequently (e.g., indirect blocks for large files) are usually in the cache.
This solves the problem of slow access to large files.
Originally, block caches were fixed size.
As memories have gotten larger, so have block caches.
Many systems now unify the block cache and the VM page pool: any page can be used for either, based on LRU access.
What happens when a block in the cache is modified?
- Synchronous writes: immediately write through
- Safe: data won't be lost if the machine crashes
- Slow: process can't continue until disk I/O completes
- Delayed writes: don't immediately write to
- Wait a while (30 seconds?) in case there are more writes to a block or the block is deleted
- Fast: writes return immediately
- Eliminates disk I/Os in many cases:
- Many small writes to the same block
- Some files are deleted quickly (e.g., temporary files)
- Dangerous: may lose data after a system crash
Free Space Management
Most common approach to free space management: bit map:
- Keep an array of bits, one per block.
- 1 means block is free, 0 means block in use
- During allocation, search bit map for a block that's close to the previous block of the file.
- If disk isn't full, this usually works pretty well.
- If disk is nearly full this becomes very expensive and doesn't produce much locality.
- Solution: don't let the disk fill up!
- Pretend disk has 10% less capacity than it really has
- If disk is 90% full, tell users it's full and don't allow any more data to be written.
Many early file systems (e.g. Unix) used a block size of 512 bytes (the size of a sector for many years).
- Inefficient I/O: more distinct transfers, hence more seeks.
- Bulkier inodes: only 128 pointers in an indirect block (pointers will occupy 1% of disk space).
Increase block size (e.g. 4 KB)?
4.3BSD solution: multiple block sizes
- Large blocks are 4 KBytes; most blocks are large
- Fragments are multiples of 512 bytes, fitting within a single large block
- The last block in a file can be a fragment.
- One large block can hold fragments from multiple files.
- Bit map for free blocks is based on fragments.
In general, hard to achieve contiguous file allocation on disk when there are both large and small files
Some newer techniques:
- Use even larger block sizes: 16 KB large blocks, 2K fragments
- Reallocate blocks as files grow
- Initially, allocate blocks one at a time (but probably won't be consecutive)
- When a file reaches a certain size, reallocate blocks looking for large contiguous clusters.
- Or, delay space allocation until flushing blocks from cache:
- By then, many more blocks will have been written
- Allocate cluster(s) for all known blocks
If there are several disk I/O's waiting to be executed, what is the best order in which to execute them?
- Goal is to minimize seek time.
First in first out (FIFO): simple, but does nothing to optimize seeks.
Shortest positioning time first (SPTF):
- Choose next request that is as close as possible to the previous one.
- Good for minimizing seeks, but can result in starvation for some requests.
Scan ("elevator algorithm").
- Same as SPTF except heads keep moving in one direction across disk.
- Once the edge of the disk has been reached, seek to the farthest block away and start again.