Virtual Memory
Lecture Notes for CS 140
Spring 2019
John Ousterhout
- Readings for this topic from Operating Systems: Principles and Practice:
Chapter 8.
- How can one memory be shared among several concurrent
processes?
- Single-tasking (no sharing):
- Highest memory holds OS.
- Process is allocated memory starting at 0, up to the OS area.
- Examples: early batch monitors where only one job
ran at a time. It could corrupt the OS,
which would be rebooted by an operator. Some early
personal computers were similar.
- Goals for sharing memory:
- Multitasking: allow multiple processes to be
memory-resident at once.
- Transparency: no process should need to be aware of the fact that
memory is shared. Each must run regardless of the number
and/or locations of processes.
- Isolation: processes mustn't be able to corrupt each other.
- Efficiency (both of CPU and memory) shouldn't be
degraded badly by sharing.
- Load-time relocation:
- Highest memory holds OS.
- First process loaded at 0; others fill empty spaces.
- When a process is loaded, relocate it so that it can run
in its allocated memory area, similar to linking:
- Linker outputs relocation records in executable files
- Similar to information in object files: indicates which
locations contain memory addresses
- OS modifies addresses when it loads process (add base address)
- What are the problems with this approach?
Dynamic Memory Relocation
- Instead of relocating a
program statically when it is loaded, add hardware (memory
management unit) that changes
addresses dynamically during every memory reference.
- Each address generated by a thread (called a
virtual address) is translated in hardware to a
physical address. This happens during every
memory reference.
- Results in two views of memory, called address spaces:
- Virtual address space is what the program sees
- Physical address space is the actual allocation of memory
Base and Bound Relocation
- Two hardware registers:
- Base: physical address corresponding to virtual address 0.
- Bound: highest allowable virtual address.
- On each memory reference:
- Compare virtual address to bound register, trap if >=
- Add virtual address to base to produce physical address.
- Each process appears to have a completely private memory
whose size is determined by the bound register.
- Processes are isolated from each other and OS.
- No address relocation is necessary when a process is loaded.
- Each process has its own base and bound values, which are
saved in the process control block.
- OS runs with relocation disabled, so it can access all
of memory (a bit in the processor status word controls
relocation).
- Must prevent users from turning off relocation or
modifying the base and bound registers (another bit
in PSW for user/kernel mode).
- Problem: how does OS regain control once it has given it up?
- Base & bound is cheap (only 2 hardware registers) and
fast: the add and compare can be done in parallel.
- What's wrong with base and bound relocation?
Multiple segments
- Each process is split among several variable-size areas
of memory, called segments.
- E.g. one segment for code, one segment for data and heap, one
segment for stack.
- Segment map holds the bases and bounds for all
the segments of a process, plus protection bit for each
segment: read-write versus read-only.
- Memory mapping procedure consists of table lookup + add +
compare.
- Each memory reference must indicate a segment number
and offset:
- Top bits of address select segment, low bits the offset.
- Example: PDP-10 with high and low segments selected by
high-order address bit.
- Or, segment can be selected implicitly by the instruction
(e.g. code vs. data, stack vs. data, or x86 prefixes).
- Advantage of segmentation: flexibility
- Manage each segment separately:
- Grow and shrink independently
- Swap to disk
- Can share segments between processes (e.g., shared code).
- Can move segments to compact memory and eliminate
fragmentation.
- What's wrong with segmentation?
Paging
- Divide virtual and physical memory into fixed-sized chunks
called pages. The most common size is 4 Kbytes.
- For each process, a page map defines the base
address of each of that process' pages along with
read-only and "present" bits.
- Page map stored in contiguous memory (with base
register in hardware).
- Translation process: page number always comes
directly from the address. Since page size is a power
of two, no comparison or addition is necessary. Just
do table lookup and bit concatenation.
- Easy to allocate: keep a free list of available pages
and grab the first one. Easy to swap since everything
is the same size, which is usually the same size as disk
blocks.
- Flexible management: Can represent a segment with a
collection of pages, starting on any page boundary.
- Problem: for modern machines, page maps can be very
large:
- Consider x86-64 addressing architecture: 48-bit
addresses, 4096-byte pages.
- Ideally, each page map should fit in a page.
- Most processes are small, so most page map entries
are unused.
- Even large processes use their address space sparsely
(e.g., code at the bottom, stack at the top)
- Solution: multi-level page maps. Intel x86-64
addressing architecture:
- 64-bit virtual addresses, but only the lower 48 bits
are actually used.
- 4 Kbyte pages: low-order 12 bits of virtual address
hold offset within page.
- 4 levels of page map, each indexed with 9 bits of virtual
address.
- Each page map fits in one page (page map entries are 8 bytes).
- Can omit empty page maps.
- Next problem: page maps are too large to load into fast
memory in the MMU.
- Page maps kept in main memory
- Relocation unit holds base address for top-level page map
- With x86-64 architecture, must make 4 memory references
to translate a virtual address!
Translation Lookaside Buffers (TLBs)
- Solution to page translation overhead: create a small hardware
cache of recent translations.
- Each cache entry stores the page number portion of a virtual
address (36 bits for x86-64) and the corresponding physical
page number (40 bits for x86-64).
- Typical TLB sizes: 64-2048 entries.
- On each memory reference, compare the page number from the
virtual address with the virtual page numbers in every
TLB entry (in parallel).
- If there is a match, use the corresponding physical page
number.
- If no match, perform the full address translation and save
the information in the TLB (replace one of the existing
entries).
- TLB "hit rates" typically 95% or more.
- TLB complications:
- When context switching, must invalidate all of the entries
in the TLB (mappings will be different for the next process).
Chip hardware does this automatically when the page map
base register is changed.
- If virtual memory mappings change for the current process
(e.g. page moved), must invalidate some TLB entries. Special
hardware instruction
for this.
Miscellaneous Topics
- How does the operating system get information from user
memory? E.g. I/O buffers, parameter blocks. Note that the user
passes the OS a virtual address.
- In some systems the OS just runs unmapped:
- OS reads page maps and translates user
addresses in software.
- Addresses that are contiguous in the virtual address space
may not be contiguous physically. Thus I/O operations may
have to be split up into multiple blocks.
- Most newer systems include kernel and user memory in same
virtual address space (but kernel memory not accessible
in user mode (special bit in page map entries)).
This makes life easier for the kernel, although it doesn't
solve the I/O problem.
- Another issue with paging: internal fragmentation.
- Can't allocate partial pages, so for small chunks of
information only part of the page will be used
- Result: wasted space at the ends of some pages
- Not much of a problem in today's systems:
- The objects (such as code or stack) tend to be
much larger than a page.
- Percentage wasted space from fragmentation is small.
- What happens if page sizes grow?