Linkers and Dynamic Linking
Optional readings for this topic from Operating Systems: Principles and Practice: none.
When a process is running, what does its memory look like? A collection of regions called sections (or segments). Basic memory layout for Linux and other Unix systems:
- Code (or "text" in Unix terminology): starts at address 0
- Data: starts immediately above code, grows upward
- Stack: starts at highest address, grows downward
System components that take part in managing a process's memory:
- Compiler and assembler:
- Generate one object file for each source code file containing information for that source file.
- Information is incomplete, since each source file generally references some things defined in other source files.
- Combines all of the object files for one program into a single executable file.
- Linker output is complete and self-sufficient.
- Operating system:
- Loads executable files into memory.
- Allows several different processes to share memory at once.
- Provides facilities for processes to get more memory after they've started running.
- Run-time library:
- Works together with OS to provide dynamic allocation routines,
- Works together with OS to provide dynamic allocation routines, such as
Linkers (or Linkage Editors,
ld in Unix,
LINK on Windows): combine
many separate pieces of a program, re-organize storage
allocation. Typically invoked invisibly by compilers.
Three functions of a linker:
- Combine all the pieces of a program.
- Figure out a memory organization so that all the pieces fit together (combine like sections).
- Touch up references so that the program can run under the new memory organization.
Result: a runnable program stored in a new object file called an executable.
Problems linker must solve:
- Assembler doesn't know where the things it's assembling will
eventually go in memory
- Assume that each section starts at address zero, let linker re-arrange.
- Assembler doesn't know addresses of external objects when assembling
files separately. E.g. where is
- Assembler just puts zero in the object file for each unresolved reference
Each object file consists of:
- Sections, each holding a distinct kind of information.
- Typical sections: code ("text") and data.
- For each section, object file contains size and assumed starting address of the section, plus initial contents, if any
- Symbol table: name and current location of each procedure or variable (except stack variables)
- Unresolved references: information about addresses in this object file that the linker must adjust once it knows the final memory allocation of the thing the address should point to.
- Additional information for debugging (e.g. map from line numbers in the source file to location in the code section).
Linker executes in three passes:
- Pass 1: read in section sizes, compute memory layout.
- Pass 2: read in all symbols, create complete symbol table in linker's memory.
- Pass 3: read in section and unresolved references, update addresses, write out executable file.
Originally all programs were linked statically, as described above:
- Each program complete
- All references resolved
Since late 1980's most systems have supported shared libraries and dynamic linking:
- For common library packages, only keep a single copy in memory, shared by all processes.
- Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
One way of implementing dynamic linking: jump table.
- If any of the files being linked are shared libraries, the linker
doesn't actually include the shared library code in the final
program. Instead, it includes two things that implement dynamic
- Jump table: an array in which each entry corresponds to one symbol
in a shared library:
- Name of a function (e.g.
- Name of shared library file containing function
- A machine instruction containing an unconditional branch (jump) that will jump to the beginning of that function.
- Name of a function (e.g.
- Dynamic loader: small library package invoked at startup to fill in the jump table.
- Jump table: an array in which each entry corresponds to one symbol in a shared library:
- For unresolved references to functions in the shared library, the linker substitutes the address of the jump table instruction: when the function is invoked, the caller will "call" the jump table entry, which redirects the call to the real function.
- Initially, all jump table entries jump to zero (unresolved).
- When the program starts up, the dynamic linker is invoked; it
iterates over all jump table entries:
- Invoke the
mmapkernel call to load the shared library into memory (if it hasn't already been loaded)
- Read the symbol tables from the shared library to find the symbol
- Modify the jump table instruction to hold the correct address for the function.
- Invoke the