Lab sessions Mon May 02 to Thu May 05
Lab written by Julie Zelenski
This lab is designed to give you a chance to:
Find an open computer and somebody new to sit with. Introduce yourself and share your suggestions about how to best prep for the upcoming midterm.
Get started. Clone the lab starter project using the command
hg clone /afs/ir/class/cs107/repos/lab5/shared lab5
Have our guide to x86-64 basics and this handy one page of x86-64 in your browser for reference during lab. Bring up the online lab checkoff up so you can jot down things as you go. At the end of the lab period, submit the sheet and have the TA check off your work.
Deadlisting with objdump. As part of the compilation process, the assembler takes in assembly instructions and encodes them into the binary form understood by the hardware. Disassembly is the reverse process that converts binary-encoded instructions back into human-readable assembly. You wrote a little disassembler in assign4. objdump
is a tool that operates on object files (i.e. files containing compiled machine code). It can dig out all sorts of information from the object file, but one of the more common uses is as a disassembler. Let's try it out!
objdump -d
extracts the instructions from an object file and outputs the sequence of binary-encoded machine instructions alongside the assembly equivalent. This dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). If symbol names were present, the instructions are grouped into sequences by function name. If you add the flag --no-show-raw-insn
flag, it omits the binary encoding and just shows the assembly (cuts down on clutter). If the object file was compiled with debugging information, adding the -S
flag to objdump
will intersperse the original C source. Use objdump -d -S --no-show-raw-insn trace
to get a sample deadlist.countops.py
python script in the repo reports the assembly instructions most heavily used in a given object file. Try out countops.py trace
for an example. The script operates by invoking objdump
to disassemble the file, tallies instructions by opcode, and reports the top 10 most frequent. Try it out on a few executables (your reassemble or synonyms or tools like emacs and gcc). Does the mix of assembly instructions seem to vary much by program?
Gdb commands for live assembly-level debugging. The debugger has great support for working with code at the assembly level. Load the trace
program in gdb, use the gdb command start
to get program going and stopped in main. From there, try out the gdb commands listed below that allow to poke around at the assembly-level. To learn more about any gdb command, try gdb's built-in help.
The gdb command disassemble
with no argument will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disable, a function name or code address.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400700 <+0>: push %rbx
0x0000000000400701 <+1>: callq 0x4005b8 <locals>
0x0000000000400706 <+6>: mov %eax,%ebx
0x0000000000400708 <+8>: callq 0x40063e <solve>
...
In the disassembly as printed by gdb, the hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function. You may notice minor differences in presentation between the disassembled instructions as printed by gdb versus the output from objdump, e.g. use of movq
instead of mov
, negative signed values may display as large unsigned, and so on.
The disassemble option /m
intersperses C source with the asm. This can be helpful when trying to relate the two. (Though for more complex passages that are significantly rearranged during compilation, both may be confusing.)
(gdb) disassemble/m main
Dump of assembler code for function main:
130 {
0x0000000000400700 <+0>: push %rbx
131 int a;
132 a = locals();
0x0000000000400701 <+1>: callq 0x4005b8 <locals>
0x0000000000400706 <+6>: mov %eax,%ebx
You can set a breakpoint at a specific machine instruction by specifying its address b *address
or an offset within a function b * main+6
. Note that the latter is not 6 instructions into main, but 6 bytes worth of instructions into main. Given the variable-length encoding of instructions, 6 bytes can correspond to one or several instructions.
(gdb) b *0x400784 break at specified address
(gdb) b *main+6 break at instruction 6 bytes past start of main
The gdb commands stepi
and nexti
allow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-level step
and next
commands. They can be abbreviated si
and ni
.
(gdb) stepi executes next single machine instruction
(gdb) nexti executes next machine instruction (proceed over fn calls)
The gdb command info reg
will print the value of the integer registers and condition codes. You can refer to an individual register by name to view or change the register's value. Within gdb, a register name is prefixed with $
instead of the usual %
.
(gdb) info reg
(gdb) p $rax show current value in %rax register
(gdb) set $rax = 9 change current value in %rax register
The gdb command set dissasemble on
turns on assembly-level display. When execution is paused/stopped, gdb usually shows you the C source line to next be executed. After setting disassemble on, it will also show the assembly instructions corresponding to the C source.
(gdb) set disassemble on
The tui
(text user interface) we showed in lecture splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb command layout <argument>
starts tui mode. The argument specifies which pane you want (src
, asm
, regs
, or split
). Tui mode is super-handy for tracing execution and observing what is happening with code/registers as you stepi
. Occasionally, tui trips itself and garbles the display. The gdb command refresh
sometimes works to clean it up. If things get really out of hand, ctrl-x a
will exit tui mode and return you to ordinary non-graphical gdb.
Reading and tracing assembly in gdb. Read over the C code in trace.c
. Compile the program and run in gdb. Use the gdb commands from the previous exercise to set breakpoints, disassemble, stepi through the assembly, print registers, and so on to answer the following questions.
In the my_variables
function:
arr
being stored? How are the values in arr
initialized? What happened to the strlen
call on the string constant to init the last array element?count
? What does this tell you about the sizeof
operator?display total
to set up a auto-display expression for the variable total
and single-step through the function. At start and end of the function, gdb reports that total
has been <optimized out>
but during the instructions where the value is "live", its value will be shown. Use the disassembly to figure the location where total
is being kept and for what range of instructions it is live. What other way could you view the live value during execution without referencing it by name?info locals
to show the local variables. Compare this list to the declarations in the C source. You'll see some variables are shown with values ("live"), some are <optimized out>
, but others don't show up at all. Look at the disassembly to figure out what happened to these entirely missing variables. How does gdb respond when you ask it to print the value of one of the unlisted variables? What if you try to set its value? Step through the function repeating the info locals
command to observe which variables are live at each step. Examine the disassembly to explain why there is no step at which both total
and squared
are live.
In the u_arith
and s_arith
functions:
if
statement is different depending on the signedness of the operand -- why? For what values will the path taken differ due to the difference in branch? Set a breakpoint before the cmp
statement and change the value of register being compared to one of those values and stepi
from there to verify the difference in paths taken for unsigned versus signed.sar
) or logical (shr
) shift? Does it change whether the type is signed or unsigned?stepi
in gdb and explain what the sequence is doing and why it differs from the unsigned calculation.
In the for_loop
, while_loop
and dowhile_loop
functions:
loops
and change the value of the parameter n being passed to the three calls such that the loop results will differ. Continue from there and see what is printed.
Exploring C compilation to assembly. A fun tool for investigating C to asm is the GCC Explorer, an online "interactive compiler". (Thanks, Josh K, for sharing!) Use the link https://godbolt.org/g/fHoZ7S configured to use the myth's version of GCC (4.8.x) and the compiler flags from the CS107 makefiles. You can enter some C code, tweak it a bit, and immediately observe how those changes are reflected in the assembly. The tool is doing the same tasks you could do on myth using gcc/gdb, but in a quick exploratory context. Here are a few experiments to try:
lea
instruction allows two adds and a multiply by constant 1, 2, 4 or 8 to be jammed into one instruction. It was designed for address arithmetic, but the math is compatible with regular integer operations and it is often used by compiler to do an efficient add/multiply combo. Type in a simple sum(x, y)
function that takes two integer arguments and returns their sum. Look at the assembly and you'll note it issued a lea
instead of the expected add
. Interesting! Change the function to return x + 2*y
or x + 8*y -17
and see how the lea
can adapt. If you try return x + 3*y
it will no longer fit the pattern for the lea
, what does the compiler use instead?scale(x)
function that takes one integer and returns the argument multiplied by constant 2. What instruction does the compiler use for the computation? What about a multiply by 8 or 16 or 256? Making a special case for powers of 2 is perhaps unsurprising but what does it do for multiply by 3 or 17 or 25? Experiment to find an integer constant C such that C*x is expressed as a true imul
instruction.
Reverse-engineering. The program babybomb
asks for input and uses it to make a call to the function mystery
in hopes of getting a successful return value. What kind of input is necessary to win at this game? Let's look into this mystery! Open the mystery.s
file to view the assembly and then use gdb stepi
through the execution of a call to mystery
and observe its execution. Once you understand how it operates, give input to the program that will pass the test and win. There are multiple ways to win -- try to find at least two different ones. You're on your way to tackling binary bomb!
Before you leave, be sure to submit your checkoff sheet and have your lab TA come by and confirm so you will be properly credited. If you don't finish everything before lab is over, we strongly encourage you to finish the remainder on your own. Double-check your progress with self check.