Lab sessions Mon Feb 13 to Thu Feb 16
Lab written by Julie Zelenski
Learning goals
This lab is designed to give you a chance to:
- use objdump and gdb to disassemble and trace assembly code
- study the relationship between source code and its assembly translation
- reverse-engineer a little assembly back to C
Find an open computer and somebody new to sit with. Share your favorite song or album from last year and/or favorite Grammys performance.
Lab exercises
1. Get started.
Clone the lab starter project using the command
hg clone /afs/ir/class/cs107/repos/lab5/shared lab5
Have our guide to x86-64 basics and this handy one page of x86-64 in your browser for reference during lab. Bring up the online lab checkoff up so you can jot down things as you go. At the end of the lab period, submit the sheet and have the TA check off your work.
2. Deadlisting with objdump
As part of the compilation process, the assembler takes in assembly instructions and encodes them into the binary form understood by the hardware. Disassembly is the reverse process that converts binary-encoded instructions back into human-readable assembly. You wrote a little disassembler in assign4. objdump is a tool that operates on object files (i.e. files containing compiled machine code). It can dig out all sorts of information from the object file, but one of the more common uses is as a disassembler. Let's try it out!
- Invoking
objdump -dextracts the instructions from an object file and outputs the sequence of binary-encoded machine instructions alongside the assembly equivalent. This dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). If symbol names were present, the instructions are grouped into sequences by function name. If you add the flag--no-show-raw-insnflag, it omits the binary encoding and just shows the assembly (cuts down on clutter). If the object file was compiled with debugging information, adding the-Sflag toobjdumpwill intersperse the original C source. Useobjdump -d -S --no-show-raw-insn traceto get a sample deadlist. - The
countops.pypython script in the repo reports the assembly instructions most heavily used in a given object file. Try outcountops.py tracefor an example. The script operates by invokingobjdumpto disassemble the file, tallies instructions by opcode, and reports the top 10 most frequent. Try it out on a few executables (your reassemble or synonyms or tools like emacs and gcc). Does the mix of assembly instructions seem to vary much by program?
3. GDB commands for live assembly-level debugging
The debugger has great support for working with code at the assembly level. Load the trace program in gdb, use the gdb command start to get program going and stopped in main. From there, try out the gdb commands listed below that allow to poke around at the assembly-level. To learn more about any gdb command, try gdb's built-in help.
- The gdb command
disassemblewith no argument will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disable, a function name or code address.(gdb) disassemble main Dump of assembler code for function main: 0x0000000000400700 <+0>: push %rbx 0x0000000000400701 <+1>: callq 0x4005b8 <locals> 0x0000000000400706 <+6>: mov %eax,%ebx 0x0000000000400708 <+8>: callq 0x40063e <solve> ...
In the disassembly as printed by gdb, the hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function. You may notice minor differences in presentation between the disassembled instructions as printed by gdb versus the output from objdump, e.g. use of movq instead of mov, negative signed values may display as large unsigned, and so on.
-
The disassemble option
/mintersperses C source with the asm. This can be helpful when trying to relate the two. (Though for more complex passages that are significantly rearranged during compilation, both may be confusing.)(gdb) disassemble/m main Dump of assembler code for function main: 130 { 0x0000000000400700 <+0>: push %rbx 131 int a; 132 a = locals(); 0x0000000000400701 <+1>: callq 0x4005b8 <locals> 0x0000000000400706 <+6>: mov %eax,%ebx -
You can set a breakpoint at a specific machine instruction by specifying its address
b *addressor an offset within a functionb * main+6. Note that the latter is not 6 instructions into main, but 6 bytes worth of instructions into main. Given the variable-length encoding of instructions, 6 bytes can correspond to one or several instructions.(gdb) b *0x400784 break at specified address (gdb) b *main+6 break at instruction 6 bytes past start of main -
The gdb commands
stepiandnextiallow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-levelstepandnextcommands. They can be abbreviatedsiandni.(gdb) stepi executes next single machine instruction (gdb) nexti executes next machine instruction (proceed over fn calls) -
The gdb command
info regwill print the value of the integer registers and condition codes. You can refer to an individual register by name to view or change the register's value. Within gdb, a register name is prefixed with$instead of the usual%.(gdb) info reg (gdb) p $rax show current value in %rax register (gdb) set $rax = 9 change current value in %rax register -
The gdb command
set dissasemble onturns on assembly-level display. When execution is paused/stopped, gdb usually shows you the C source line to next be executed. After setting disassemble on, it will also show the assembly instructions corresponding to the C source.(gdb) set disassemble on -
The
tui(text user interface) we showed in lecture splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb commandlayout <argument>starts tui mode. The argument specifies which pane you want (src,asm,regs, orsplit). Tui mode is super-handy for tracing execution and observing what is happening with code/registers as youstepi. Occasionally, tui trips itself and garbles the display. The gdb commandrefreshsometimes works to clean it up. If things get really out of hand,ctrl-x awill exit tui mode and return you to ordinary non-graphical gdb.
4. Reading and tracing assembly in GDB
Read over the C code in trace.c. Compile the program and run in gdb. Use the gdb commands from the previous exercise to set breakpoints, disassemble, stepi through the assembly, print registers, and so on to answer the following questions.
In the my_variables function:
- Where is
arrbeing stored? How are the values inarrinitialized? What happened to thestrlencall on the string constant to init the last array element? - What instructions were emitted to compute the value assigned to
count? What does this tell you about thesizeofoperator? - Use the gdb command
display totalto set up a auto-display expression for the variabletotaland single-step through the function. At start and end of the function, gdb reports thattotalhas been<optimized out>but during the instructions where the value is "live", its value will be shown. Use the disassembly to figure the location wheretotalis being kept and for what range of instructions it is live. What other way could you view the live value during execution without referencing it by name? - Stop at the function start and use the gdb command
info localsto show the local variables. Compare this list to the declarations in the C source. You'll see some variables are shown with values ("live"), some are<optimized out>, but others don't show up at all. Look at the disassembly to figure out what happened to these entirely missing variables. How does gdb respond when you ask it to print the value of one of the unlisted variables? What if you try to set its value? Step through the function repeating theinfo localscommand to observe which variables are live at each step. Examine the disassembly to explain why there is no step at which bothtotalandsquaredare live.
In the u_arith and s_arith functions:
- These functions invoke same sequence of arithmetic operations but differ in the signedness of the operands. Carefully compare the disassembly for the two functions.
- The first three C statements using add, subtract, and multiply compile into exactly the same assembly sequence for both functions. How it is possible that these instructions do the correct thing for both unsigned and signed arithmetic?
- The branch instruction emitted for the
ifstatement is different depending on the signedness of the operand -- why? For what values will the path taken differ due to the difference in branch? Set a breakpoint before thecmpstatement and change the value of register being compared to one of those values andstepifrom there to verify the difference in paths taken for unsigned versus signed. - When doing a right-shift, does gcc emit an arithmetic (
sar) or logical (shr) shift? Does it change whether the type is signed or unsigned? - To divide by 2, what instruction is used for unsigned? For signed, the assembly sequence is more complex. Trace through by hand or
stepiin gdb and explain what the sequence is doing and why it differs from the unsigned calculation.
In the for_loop, while_loop and dowhile_loop functions:
- First, read the C code for the three loop variants. Under which conditions are these loops expected to have the same behavior and when will they differ?
- Now examine the assembly. Two of the loops have the exact same assembly sequence -- which two? How does the assembly of the third loop differ from the other two? Why does it differ?
- Set a breakpoint on
loopsand change the value of the parameter n being passed to the three calls such that the loop results will differ. Continue from there and see what is printed.
5. Exploring C compilation to assembly
A fun tool for investigating C to asm is the GCC Explorer, an online "interactive compiler". (Thanks, Josh K, for sharing!) Use the link https://godbolt.org/g/fHoZ7S configured to use the myth's version of GCC (4.8.x) and the compiler flags from the CS107 makefiles. You can enter some C code, tweak it a bit, and immediately observe how those changes are reflected in the assembly. The tool is doing the same tasks you could do on myth using gcc/gdb, but in a quick exploratory context. Here are a few experiments to try:
- The
leainstruction allows two adds and a multiply by constant 1, 2, 4 or 8 to be jammed into one instruction. It was designed for address arithmetic, but the math is compatible with regular integer operations and it is often used by compiler to do an efficient add/multiply combo. Type in a simplesum(x, y)function that takes two integer arguments and returns their sum. Look at the assembly and you'll note it issued aleainstead of the expectedadd. Interesting! Change the function toreturn x + 2*yorx + 8*y -17and see how theleacan adapt. If you tryreturn x + 3*yit will no longer fit the pattern for thelea, what does the compiler use instead? -
Multiply is a mildly expensive operation and the compiler will do its darnedest to use a combo of add, shift, or lea instead. Type in a simple
scale(x)function that takes one integer and returns the argument multiplied by constant 2. What instruction does the compiler use for the computation? What about a multiply by 8 or 16 or 256? Making a special case for powers of 2 is perhaps unsurprising but what does it do for multiply by 3 or 17 or 25? Experiment to find an integer constant C such that C*x is expressed as a trueimulinstruction.
6. Reverse-engineering
The program babybomb asks for input and uses it to make a call to the function mystery in hopes of getting a successful return value. What kind of input is necessary to win at this game? Let's look into this mystery! Open the mystery.s file to view the assembly and then use gdb stepi through the execution of a call to mystery and observe its execution. Once you understand how it operates, give input to the program that will pass the test and win. There are multiple ways to win -- try to find at least two different ones. You're on your way to tackling binary bomb!
Check off with TA
Before you leave, be sure to submit your checkoff sheet and have your lab TA come by and confirm so you will be properly credited. If you don't finish everything before lab is over, we strongly encourage you to finish the remainder on your own. Double-check your progress with self check.