Linkers and Dynamic Linking

Lecture Notes for CS 140
Winter 2012
John Ousterhout

  • Readings for this topic from Operating System Concepts: none.
  • When a process is running, what does its memory look like? A collection of regions called sections. Basic memory layout for Linux and other Unix systems:
    • Code (or "text" in Unix terminology): starts at location 0
    • Data: starts immediately above code, grows upward
    • Stack: starts at highest address, grows downward
  • System components that take part in managing a process's memory:
    • Compiler and assembler:
      • Generate one object file for each source code file containing information for that source file.
      • Information is incomplete, since each source file generally references some things defined in other source files.
    • Linker:
      • Combines all of the object files for one program into a single object file.
      • Linker output is complete and self-sufficient.
    • Operating system:
      • Loads object files into memory.
      • Allows several different processes to share memory at once.
      • Provides facilities for processes to get more memory after they've started running.
    • Run-time library:
      • Works together with OS to provide dynamic allocation routines, such as malloc and free in C.
  • Linkers (or Linkage Editors, ld in Unix, LINK on Windows): combine many separate pieces of a program, re-organize storage allocation. Typically invoked invisibly by compilers.
  • Three functions of a linker:
    • Collect all the pieces of a program.
    • Figure out a new memory organization so that all the pieces fit together (combine like sections).
    • Touch up addresses so that the program can run under the new memory organization.
  • Result: a runnable program stored in a new object file called an executable.
  • Problems linker must solve:
    • Assembler doesn't know addresses of external objects when assembling files separately. E.g. where is printf routine?
      • Assembler just puts zero in the object file for each unknown address
    • Assembler doesn't know where the things it's assembling will go in memory
      • Assume that things start at address zero, let linker re-arrange.
  • Each object file consists of:
    • Sections, each holding a distinct kind of information.
      • Typical sections: code ("text") and data.
      • For each section, object file contains size and current location of the section, plus initial contents, if any
    • Symbol table: name and current location of variable or procedure that can potentially be referenced in other object files.
    • Relocation records : information about addresses referenced in this object file that the linker must adjust once it knows the final memory allocation.
    • Additional information for debugging (e.g. map from line numbers in the source file to location in the code section).
  • Example files:
    main.c:
    extern float sin();
    extern printf(), scanf();
    
    main() {
      double x, result;
      printf("Type number: ");
      scanf("%f", &x);
      result = sin(x);
      printf("Sine is %f\n",
            result);
    }
    

    stdio.c:
    int printf(char *fmt, ...) {
      ...
    }
    int scanf(char *fmt, ...) {
      ...
    }
    

    math.c:
    double sin(double x) {
      static double res, lastx;
      if (x != lastx) {
        lastx = x;
        ... compute sin(x) ...
      }
      return res;
    }
    
  • Linker executes in two passes:
    • Pass 1: read in section sizes, compute final memory layout. Also, read in all symbols, create complete symbol table in memory.
    • Pass 2: read in section and relocation information, update addresses, write out new file.
  • Relocation records:
    • Address and size of the value to be relocated
    • Symbol that determines amount of relocation
    • How to relocate:
      • Overwrite with final address of symbol
      • Add final address of symbol to current contents; used for accessing element of record:
        x = y.q;
        
        y is an external symbol, but the offset q is known from a header file
      • Add difference between final and original addresses of symbol to current contents

Dynamic Linking

  • Originally all programs were linked statically, as described above:
    • All external references fully resolved
    • Each program complete
  • Since late 1980's most systems have supported shared libraries and dynamic linking:
    • For common library packages, only keep a single copy in memory, shared by all processes.
    • Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
  • One way of implementing dynamic linking: jump table.
    • Jump table: an array in which each entry is a single machine instruction containing an unconditional branch (jump).
    • For each function in a shared library used by the program, there is one entry in the jump table that will jump to the beginning of that function.
    • If one of the files being linked is a shared library, the linker doesn't actually include the shared library code in the final program. Instead it creates a jump table with slots for all of the functions that are used from that library.
    • For relocation records referring to functions in the shared library, the linker substitutes the address of the jump table entry: when the function is invoked, the caller will "call" the jump table entry, which redirects the call to the real function.
    • When the program starts up, the shared library is loaded into memory and the jump table addresses are adjusted to reflect the load location.