Lab 7: Compilation tools and executables

Lab sessions Mon Feb 27 to Thu Mar 02

Lab written by Julie Zelenski

Learning goals

After this lab, you should be able to:

  1. describe the steps to build an executable from C code and tasks performed by the preprocessor/compiler/assembler/linker
  2. interpret the symptoms of a build error and what to do to fix it
  3. diagram the program address space

Find an open computer and somebody new to sit with. Introduce yourself and share war stories about your efforts to defuse your binary bomb.

Lab exercises

1. Get started.

Clone the starter project using the command

hg clone /afs/ir/class/cs107/repos/lab7/shared lab7

This creates the lab7 directory which contains source files and a Makefile.

Pull up the online lab checkoff and have it open in a browser so you'll be able to jot down things as you go. At the end of the lab period, you will submit that sheet and have the TA check off your work.

2. Understanding object files.

An object file is the product of the compiler/assembler translating C source into object code. There are several Unix tools that can be used to poke around in object files, such as the disassembler objdump, your old friend from disarming the bomb. Try out the commands below to see what information they provide. Each tool has a man page you can check into for further information.

3. The preprocessor cpp.

As the first step of compilation, the preprocessor does a variety of text-based transformations such as:

Read through the pre.c file and predict what it would look like after preprocessing. Run just the preprocessor using gcc -E pre.c and look at the output and verify you have the correct ideas.

The list below identifies some of things that might go wrong with preprocessor directives. A missing #include or wrong #define seems like a preprocessor error, but in most cases, the consequences won't show up until further downstream and it will require sleuthing to relate the symptom back to the root cause. Edit the pre.c file to create each of the problems listed below and try to build. If the build fails, when is the problem detected and by which tool (preprocessor, compiler, linker)? Is it a hard error or just a warning? If it builds despite the problem, does the program run correctly?

4. Linking.

The relationship between the compiler and linker is one of the more misunderstood aspects of the build process. The compiler operates on a single .c file at a time and produces an object file (also referred to as a relocatable file). A .o file contains compiled assembly for all the functions defined in the .c file, but it is not a full program until linked. The linker mashes together the object file(s) and system libraries, and in the process has to resolve cross-module references and relocate addresses to their final location. A key task for the linker is resolving symbols-- ensuring there is at least one and no more than one definition for each symbol name in the global namespace. The linker detects exactly two kinds of errors-- undefined symbols and multiply-defined symbols.

5. Who detects what?

One of the most important benefits of understanding the entire tool chain is that you are in a better position to know the right fix when you hit a build error. Below are a few common build errors. First, think through how you think each would be handled, then try making the error and building to verify your understanding is correct.

6. Charting the address space.

A program's address space is divided into segments: text (code), stack, global data, etc. The segments tend to be placed in predictable locations. Developing a feel for the address range used for each purpose can help you theorize about what has gone wrong when memory is out of whack. Run the addrspace program under gdb to answer these questions:

Chart the address space, label segments and note where gaps occur. Given a troublesome address, you can use this chart to identify whether the address is located within the stack/heap/global/code, which is a helpful clue when tracking down the problem. Of the entire addressable region, about what percentage appears in use?

7. Optional extra exploration: preprocessor macros and inline functions

Preprocessor macros have a number of pitfalls and we strongly discourage their use in favor of inlines. However, you may encounter macros in the code of others and it can useful to understand the mechanism and why macros can be problematic. Review the code in macro.c to see the definition and use of the macro MAX(x, y).

8. Optional extra diversion: binary hacking

The loader is responsible for running an executable file by starting the new process and configuring its address space. The code and data segments of the address space consist of data directly mapped in from the executable file. The executable file contains object code along with string constants, global data, and possibly symbol and debugging information. If you are very careful, there are ways to directly edit an executable file to change its runtime behavior, for example, by directly modifying data in the segments that will be mapped in. To be clear, there is rarely legitimate cause to do this, but we can play around with binary hacking to better understand the contents of the executable and its relationship to the executing program. If you open the binary file in emacs and invoke M-x hexl-mode, emacs will act as a raw hex editor. We're going to experiment with editing the addrspace executable file.

Just for fun lab followups

Check off with TA

Before you leave, be sure to submit your checkoff sheet (in the browser) and have lab TA come by and confirm so you will be properly credited for lab If you don't finish everything before lab is over, we strongly encourage you to finish the remainder on your own. Double-check your progress with self check.

Contents