Linkers and Dynamic Linking

Lecture Notes for CS 140
Spring 2014
John Ousterhout

Readings for this topic from Operating Systems: Principles and Practice: none.

When a process is running, what does its memory look like? A collection of regions called sections. Basic memory layout for Linux and other Unix systems:
- Code (or "text" in Unix terminology): starts at location 0
- Data: starts immediately above code, grows upward
- Stack: starts at highest address, grows downward

System components that take part in managing a process's memory:
- Compiler and assembler:
  - Generate one object file for each source code file containing information for that source file.
  - Information is incomplete, since each source file generally references some things defined in other source files.
- Linker:
  - Combines all of the object files for one program into a single object file.
  - Linker output is complete and self-sufficient.
- Operating system:
  - Loads object files into memory.
  - Allows several different processes to share memory at once.
  - Provides facilities for processes to get more memory after they've started running.
- Run-time library:
  - Works together with OS to provide dynamic allocation routines, such as malloc and free in C.
Linkers (or Linkage Editors, ld in Unix, LINK on Windows): combine many separate pieces of a program, re-organize storage allocation. Typically invoked invisibly by compilers.
Three functions of a linker:
- Combine all the pieces of a program.
- Figure out a new memory organization so that all the pieces fit together (combine like sections).
- Touch up addresses so that the program can run under the new memory organization.
Result: a runnable program stored in a new object file called an executable.

Problems linker must solve:
- Assembler doesn't know addresses of external objects when assembling files separately. E.g. where is printf routine?
  - Assembler just puts zero in the object file for each unknown address
- Assembler doesn't know where the things it's assembling will go in memory
  - Assume that things start at address zero, let linker re-arrange.
Each object file consists of:
- Sections, each holding a distinct kind of information.
  - Typical sections: code ("text") and data.
  - For each section, object file contains size and current location of the section, plus initial contents, if any
- Symbol table: name and current location of variable or procedure that can be referenced in other object files.
- Relocation records : information about addresses referenced in this object file that the linker must adjust once it knows the final memory allocation.
- Additional information for debugging (e.g. map from line numbers in the source file to location in the code section).

Linker executes in three passes:
- Pass 1: read in section sizes, compute final memory layout.
- Pass 2: read in all symbols, create complete symbol table in memory.
- Pass 3: read in section and relocation information, update addresses, write out new file.

Dynamic Linking

Originally all programs were linked statically, as described above:
- All external references fully resolved
- Each program complete
Since late 1980's most systems have supported shared libraries and dynamic linking:
- For common library packages, only keep a single copy in memory, shared by all processes.
- Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
One way of implementing dynamic linking: jump table.
- If any of the files being linked are shared libraries, the linker doesn't actually include the shared library code in the final program. Instead, it includes two things that implement dynamic linking:
  - Jump table: an array in which each entry is a single machine instruction containing an unconditional branch (jump).
    - For each function in a shared library used by the program, there is one entry in the jump table that will jump to the beginning of that function.
  - Dynamic loader: library package invoked at startup to fill in the jump table.
- For relocation records referring to functions in the shared library, the linker substitutes the address of the jump table entry: when the function is invoked, the caller will "call" the jump table entry, which redirects the call to the real function.
- Initially, all jump table entries jump to zero (unresolved).
- When the program starts up, the dynamic load library is invoked:
  - It invokes the OS mmap functions to load each shared library into memory.
  - It fills in the jump table with the correct address for each function in a shared library.

CS 140: Operating Systems (Spring 2014)

Linkers and Dynamic Linking

Dynamic Linking