Managing Flash Memory

Lecture Notes for CS 140
Spring 2020
John Ousterhout

  • Readings for this topic from Operating Systems: Principles and Practice: Section 12.2.
  • Solid state (semiconductor) storage ("SSD"s) have replaced disks in many applications (e.g. phones and other devices).
    • Comparison to disk:
      • No moving parts, so more reliable
      • 100x faster access
      • More shock-resistant
      • Cost/bit 5-10x higher than disk
    • Comparison to DRAM:
      • Nonvolatile: values persist even if device is powered off
      • Cost/bit 5-10x lower
      • 1000x slower
  • Flash memory hardware:
    • Total chip capacity up to 512 Gbytes
    • Storage divided into erase units (typically 256 Kbytes), which are subdivided into pages (typically 512 bytes or 4 Kbytes)
    • Storage is read in units of pages
    • Two kinds of writes:
      • Erase: sets all of the bits in an erase unit to 1's.
      • Write: modifies an individual page, can only clear bits to 0 (writing 1's has no effect).
      • Can write repeatedly to clear more bits.
    • Wear-out: once a page has been erased many times (typically around 100,000, as low as 1,000 in some new devices) it no longer stores information reliably.
  • Typical flash memory performance:
    • Read performance: 20-100 microsconds latency, 100-2500 MBytes/sec.
    • Erasure time: 2 ms
    • Write performance: 200 microseconds latency, 100-500 MBytes/sec.
  • Most flash memory devices are packaged with a flash translation layer (FTL):
    • Software that manages the flash device
    • Typically provides an interface like that for a disk (read and write blocks)
    • Use with existing file system software
  • FTLs are interesting pieces of software, but they have several problems:
    • Sacrifice performance
    • Waste capacity
    • Proprietary: no information available about implementation and performance pathologies
  • One possible approach for FTLs: direct mapped (e.g., some cheap flash sticks)
    • FAT format often used
    • Virtual block i is stored on page i of the flash device
    • Reads are simple
    • To write virtual block i:
      • Read erase unit containing page i
      • Erase the entire unit
      • Rewrite erase unit with modified page
    • What's wrong with this approach?
  • To avoid these problems:
    • Separate virtual block number from physical location in flash memory
    • A given virtual block can occupy different pages in flash memory over time.
  • Keep a block map that maps from virtual blocks to physical pages
    • Reads must first lookup the physical location in the block map
    • For writes:
      • Find a free and erased page
      • Write virtual block to that page
      • Update block map with new location
      • Mark previous page for virtual block as free
    • This introduces additional issues
      • How to manage map (is it stored on the flash device?)
      • How to manage free space (e.g. wear leveling)
  • One approach: keep block map in memory, rebuild on startup:
    • Don't store block map on flash device
    • Each page on flash contains an additional header:
      • Virtual block number
      • Allocated bit (1 => free, 0 => allocated)
      • Written bit (1 => not yet written (all zeroes), 0 => written)
      • Garbage bit (0 => no longer in use, must be erased before reusing page)
    • A-W-G bits track lifecycle of page:
      • Just erased: 1-1-1
      • About to write data: 0-1-1
      • Block successfully written: 0-0-1
      • Block deleted (new copy written elsewhere): 0-0-0
      • Why is 0-1-1 state needed?
    • On startup, read entire contents of flash memory to rebuild block map (32 seconds for 8GB, 512 seconds for 128GB).
  • To reduce memory utilization for block map, store block map in flash, cache parts of it in memory
    • Header for each flash page indicates whether that page is a data page or a map page
    • Keep locations of map pages in memory (map-map)
    • Scan flash on startup to re-create map-map
    • During writes, must write new map page plus new data page
    • Some reads may require 2 flash operations
  • Garbage pages accumulate in erase units, which reduces effective capacity.
  • Solution: garbage collection
    • Find erase units with many free pages
    • Copy live pages to a clean erase unit (update block map)
    • Erase and reuse old erase unit
    • Note: must always keep at least one clean erase unit to use for garbage collection!
  • Hard to achieve good performance and good utilization at the same time:
    • If the flash device is 90% utilized, write cost increases by > 10x:
      • To get space for one new erase unit, must garbage collect 10 old erase units
      • 9 will still be valid and must be copied
      • 1 new erase unit gets written
      • Total: 9 reads, 10 writes to write 1 new erase unit!
      • This is called write amplification
    • Frequent garbage collection (e.g. because of high utilization) also wears out the device faster
    • Lower utilization makes writes cheaper, but wastes space.
    • Ideal situation: hot and cold data
      • Some erase units contain only data that is never modified ("cold"), so they are always full and never need to be garbage collected.
      • Other erase units contain data that is quickly overwritten; we can just wait until all of the pages have been overwritten, then reuse the erase unit.
      • There are ways to encourage such a bimodal distribution.
  • Wear-leveling:
    • Want all erase units to be erased at about the same rate
    • Use garbage collection to move data between "hot" and "cold" pages.
    • Main problem is with cold erase units (hot ones turn over quickly)
    • Every once in a while, garbage collect a cold erase unit (one that hasn't been erased very often) even if it has no free pages
    • This moves the cold data to another erase unit and allows the unworn erase unit to be assigned new data, which will probably turn over more quickly.
  • Incorporating flash memory as a disk-like device with FTL is inefficient:
    • Duplication:
      • OS already keeps various index structures for files:
      • These are equivalent to the block map
      • If OS could manage the flash directly, it could combine the block map with file indexes
    • Lack of information:
      • FTL doesn't know when OS has freed a block; only finds out when block is overwritten
      • Thus FTL may rewrite dead blocks during garbage collection!
      • Newer flash devices offer trim command that allows OS to indicate deletion (but must modify OS file systems).
  • Better long-term solution: new file systems designed just for flash memory
    • Lots of interesting issues and design alternatives
    • Has been explored by research teams, but no widely-used implementations
    • Need ability to bypass the FTL
    • Interesting opportunity
    • Organize entire file system like a log
  • Newest alternative: nonvolatile memory such as Intel 3D XPoint
    • Reads: 300ns
    • Writes: 100ns
    • Capacity: 512 GB per DIMM!
    • Not yet clear how to use these (a file system sacrifices most of the performance)