Linkers and Dynamic Linking

Lecture Notes for CS 140
Spring 2020
John Ousterhout

  • Readings for this topic from Operating Systems: Principles and Practice: none.
  • When a process is running, what does its memory look like? A collection of regions called sections (or segments). Basic memory layout for Linux and other Unix systems:
    • Code (or "text" in Unix terminology): starts at location 0
    • Data: starts immediately above code, grows upward
    • Stack: starts at highest address, grows downward
  • System components that take part in managing a process's memory:
    • Compiler and assembler:
      • Generate one object file for each source code file containing information for that source file.
      • Information is incomplete, since each source file generally references some things defined in other source files.
    • Linker:
      • Combines all of the object files for one program into a single executable file.
      • Linker output is complete and self-sufficient.
    • Operating system:
      • Loads executable files into memory.
      • Allows several different processes to share memory at once.
      • Provides facilities for processes to get more memory after they've started running.
    • Run-time library:
      • Works together with OS to provide dynamic allocation routines, such as malloc and free in C.
  • Linkers (or Linkage Editors, ld in Unix, LINK on Windows): combine many separate pieces of a program, re-organize storage allocation. Typically invoked invisibly by compilers.
  • Three functions of a linker:
    • Combine all the pieces of a program.
    • Figure out a new memory organization so that all the pieces fit together (combine like sections).
    • Touch up addresses so that the program can run under the new memory organization.
  • Result: a runnable program stored in a new object file called an executable.
  • Problems linker must solve:
    • Assembler doesn't know where the things it's assembling will eventually go in memory
      • Assume that each section starts at address zero, let linker re-arrange.
    • Assembler doesn't know addresses of external objects when assembling files separately. E.g. where is printf routine?
      • Assembler just puts zero in the object file for each unresolved address
  • Each object file consists of:
    • Sections, each holding a distinct kind of information.
      • Typical sections: code ("text") and data.
      • For each section, object file contains size and assumed starting address of the section, plus initial contents, if any
    • Symbol table: name and current location of each procedure or variable (except stack variables)
    • Relocation records : information about addresses referenced in this object file that the linker must adjust once it knows the final memory allocation.
    • Additional information for debugging (e.g. map from line numbers in the source file to location in the code section).
  • Linker executes in three passes:
    • Pass 1: read in section sizes, compute final memory layout.
    • Pass 2: read in all symbols, create complete symbol table in memory.
    • Pass 3: read in section and relocation information, update addresses, write out new file.

Dynamic Linking

  • Originally all programs were linked statically, as described above:
    • Each program complete
    • All references resolved
  • Since late 1980's most systems have supported shared libraries and dynamic linking:
    • For common library packages, only keep a single copy in memory, shared by all processes.
    • Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
  • One way of implementing dynamic linking: jump table.
    • If any of the files being linked are shared libraries, the linker doesn't actually include the shared library code in the final program. Instead, it includes three things that implement dynamic linking:
      • Jump table: an array in which each entry is a single machine instruction containing an unconditional branch (jump).
        • For each function in a shared library used by the program, there is one entry in the jump table that will jump to the beginning of that function.
      • Shared library metadata: for each shared library used by the program, the names of the functions needed from that library, and corresponding locations in the jump table.
      • Dynamic loader: small library package invoked at startup to fill in the jump table.
    • For relocation records referring to functions in the shared library, the linker substitutes the address of the jump table entry: when the function is invoked, the caller will "call" the jump table entry, which redirects the call to the real function.
    • Initially, all jump table entries jump to zero (unresolved).
    • When the program starts up, the dynamic loader is invoked:
      • It invokes the OS mmap functions to load each shared library into memory.
      • It reads symbol tables from libraries
      • It fills in the jump table with the correct address for each function in a shared library (info is in symbol table).