Managing Flash Memory

Lecture Notes for CS 140
Spring 2020
John Ousterhout

Readings for this topic from Operating Systems: Principles and Practice: Section 12.2.

Solid state (semiconductor) storage ("SSD"s) have replaced disks in many applications (e.g. phones and other devices).
- Comparison to disk:
  - No moving parts, so more reliable
  - 100x faster access
  - More shock-resistant
  - Cost/bit 5-10x higher than disk
- Comparison to DRAM:
  - Nonvolatile: values persist even if device is powered off
  - Cost/bit 5-10x lower
  - 1000x slower
Flash memory hardware:
- Total chip capacity up to 512 Gbytes
- Storage divided into erase units (typically 256 Kbytes), which are subdivided into pages (typically 512 bytes or 4 Kbytes)
- Storage is read in units of pages
- Two kinds of writes:
  - Erase: sets all of the bits in an erase unit to 1's.
  - Write: modifies an individual page, can only clear bits to 0 (writing 1's has no effect).
  - Can write repeatedly to clear more bits.
- Wear-out: once a page has been erased many times (typically around 100,000, as low as 1,000 in some new devices) it no longer stores information reliably.
Typical flash memory performance:
- Read performance: 20-100 microsconds latency, 100-2500 MBytes/sec.
- Erasure time: 2 ms
- Write performance: 200 microseconds latency, 100-500 MBytes/sec.

Most flash memory devices are packaged with a flash translation layer (FTL):
- Software that manages the flash device
- Typically provides an interface like that for a disk (read and write blocks)
- Use with existing file system software
FTLs are interesting pieces of software, but they have several problems:
- Sacrifice performance
- Waste capacity
- Proprietary: no information available about implementation and performance pathologies
One possible approach for FTLs: direct mapped (e.g., some cheap flash sticks)
- FAT format often used
- Virtual block i is stored on page i of the flash device
- Reads are simple
- To write virtual block i:
  - Read erase unit containing page i
  - Erase the entire unit
  - Rewrite erase unit with modified page
- What's wrong with this approach?
To avoid these problems:
- Separate virtual block number from physical location in flash memory
- A given virtual block can occupy different pages in flash memory over time.
Keep a block map that maps from virtual blocks to physical pages
- Reads must first lookup the physical location in the block map
- For writes:
  - Find a free and erased page
  - Write virtual block to that page
  - Update block map with new location
  - Mark previous page for virtual block as free
- This introduces additional issues
  - How to manage map (is it stored on the flash device?)
  - How to manage free space (e.g. wear leveling)
One approach: keep block map in memory, rebuild on startup:
- Don't store block map on flash device
- Each page on flash contains an additional header:
  - Virtual block number
  - Allocated bit (1 => free, 0 => allocated)
  - Written bit (1 => not yet written (all zeroes), 0 => written)
  - Garbage bit (0 => no longer in use, must be erased before reusing page)
- A-W-G bits track lifecycle of page:
  - Just erased: 1-1-1
  - About to write data: 0-1-1
  - Block successfully written: 0-0-1
  - Block deleted (new copy written elsewhere): 0-0-0
  - Why is 0-1-1 state needed?
- On startup, read entire contents of flash memory to rebuild block map (32 seconds for 8GB, 512 seconds for 128GB).
To reduce memory utilization for block map, store block map in flash, cache parts of it in memory
- Header for each flash page indicates whether that page is a data page or a map page
- Keep locations of map pages in memory (map-map)
- Scan flash on startup to re-create map-map
- During writes, must write new map page plus new data page
- Some reads may require 2 flash operations
Garbage pages accumulate in erase units, which reduces effective capacity.
Solution: garbage collection
- Find erase units with many free pages
- Copy live pages to a clean erase unit (update block map)
- Erase and reuse old erase unit
- Note: must always keep at least one clean erase unit to use for garbage collection!

Hard to achieve good performance and good utilization at the same time:
- If the flash device is 90% utilized, write cost increases by > 10x:
  - To get space for one new erase unit, must garbage collect 10 old erase units
  - 9 will still be valid and must be copied
  - 1 new erase unit gets written
  - Total: 9 reads, 10 writes to write 1 new erase unit!
  - This is called write amplification
- Frequent garbage collection (e.g. because of high utilization) also wears out the device faster
- Lower utilization makes writes cheaper, but wastes space.
- Ideal situation: hot and cold data
  - Some erase units contain only data that is never modified ("cold"), so they are always full and never need to be garbage collected.
  - Other erase units contain data that is quickly overwritten; we can just wait until all of the pages have been overwritten, then reuse the erase unit.
  - There are ways to encourage such a bimodal distribution.
Wear-leveling:
- Want all erase units to be erased at about the same rate
- Use garbage collection to move data between "hot" and "cold" pages.
- Main problem is with cold erase units (hot ones turn over quickly)
- Every once in a while, garbage collect a cold erase unit (one that hasn't been erased very often) even if it has no free pages
- This moves the cold data to another erase unit and allows the unworn erase unit to be assigned new data, which will probably turn over more quickly.
Incorporating flash memory as a disk-like device with FTL is inefficient:
- Duplication:
  - OS already keeps various index structures for files:
  - These are equivalent to the block map
  - If OS could manage the flash directly, it could combine the block map with file indexes
- Lack of information:
  - FTL doesn't know when OS has freed a block; only finds out when block is overwritten
  - Thus FTL may rewrite dead blocks during garbage collection!
  - Newer flash devices offer trim command that allows OS to indicate deletion (but must modify OS file systems).
Better long-term solution: new file systems designed just for flash memory
- Lots of interesting issues and design alternatives
- Has been explored by research teams, but no widely-used implementations
- Need ability to bypass the FTL
- Interesting opportunity
- Organize entire file system like a log
Newest alternative: nonvolatile memory such as Intel 3D XPoint
- Reads: 300ns
- Writes: 100ns
- Capacity: 512 GB per DIMM!
- Not yet clear how to use these (a file system sacrifices most of the performance)

CS 140: Operating Systems (Spring 2020)

Managing Flash Memory