Lab 5: Assembly

Lab sessions Tue May 12 to Thu May 14

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Pre-Lab Exercises

Before attending your lab, please work through the following exercises. We plan for the exercises to take 20-30min maximum, and the exercises will be essential to the further problems you'll work through during your lab session. Feel free to post on the discussion forum if you have questions while working!

Start by reading over the "Learning Goals" section of the lab writeup below to get familiar with the focus of this week's lab. Also glance over the checkoff sheet questions.
Read over the description of problem 1 (Assembly Code Study) and part 1 of that section (Addressing Modes). Work through the first five bullets (stop when you get to the second premade environment). If you get stuck, feel free to post on the discussion forum or jot down any thoughts or questions you have to bring with you to lab to discuss with your partner and lab TA.

Learning Goals

This lab is designed to give you a chance to:

study the relationship between C source and its assembly translation
use objdump and gdb to disassemble and trace assembly code

Once you have gotten set up with your group, share with them anything fun you did over the weekend!

Get Started

Clone the repo by using the command below to create a lab5 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab5/shared lab5

Open the lab checkoff form.

Exercises

To study C->assembly translation, we recommend you try out Compiler Explorer, a nifty online tool that provides an "interactive compiler". This link and the pre-provided links throughout this lab are pre-configured for myth's version of GCC and compiler flags from the CS107 makefiles. (To manually configure yourself: select language C, compiler version x86-64 gcc 5.4 and enter flags -Og -std=gnu99).

In Compiler Explorer, you can enter a C function, see its generated assembly, then tweak the C source and observe how those changes are reflected in the assembly instructions. You could instead make these same observations on myth using gcc and gdb, but Compiler Explorer makes it easier to do those tasks in a convenient environment for exploration. Try it out!

1) Assembly Code Study (55 minutes)

This section will get you familiar with various assembly commands, from the mov command from the earlier lecture, as well as commands for arithmetic and logic. Check out a reference for these commands here while you work.

Addressing Modes

Review the two deref functions below.

void deref_one(char *ptr, long index) {
    *ptr = '\0';
}

void deref_two(int *ptr, long index) {
    *ptr = 0;
}

Open Compiler Explorer - we've created a pre-made environment already loaded with this code here. Review the generated assembly shown in the right pane.

Take time to read through the mov instruction. What is it doing? How does that connect with the C code it represents?
There is one difference between the assembly instruction sequences of the two functions. What is the difference? Why it is different?
Edit both functions to instead assign ptr[7] to their respective values. How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
Edit both functions to now assign ptr[index] to their respective values. How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
Change the assignment statement to ptr[0] = ptr[1]. For both functions, the assembly sequence is one instruction longer. Previously, only one mov instruction was needed. This assignment statement requires two movs. Why? (Extra: with this change, you may wonder why the compiler outputs movzbl when it uses only one byte of what is moved. In this case it would likely be functionally equivalent to just move the one byte needed, but the compiler makes choices that it thinks make program execution more efficient. For example, this may help with something called instruction-level parallelism)

Now open the pre-made environment available here for the code below:

typedef struct coord {
    int x;
    int y;
} coord;

void deref_three(struct coord *ptr) {
    ptr->x = ptr->y;
}

void deref_two(int *ptr, long index) {
    ptr[0] = ptr[1];
}

Take time to read through the mov instructions. What are they doing? How does that connect with the C code they represent?
The assembly for deref_three is identical to deref_two, but the C source for the two functions seem to have nothing to do with one another! How is possible that both can generate the same assembly instructions?

Signed/Unsigned Arithmetic

Let's explore a bit with arithmetic in assembly.

The add, subl and imul instructions perform addition, subtraction and multiplication, respectively. Here is the format for these instructions:

    add src, dst       # dst += src
    sub src, dst       # dst -= src
    imul src, dst      # dst *= src

The following two functions perform the same arithmetic operation on their arguments, but those arguments differ in type (signedness).

int signed_arithmetic(int a, int b) {
    return (a - b) * a;
}

unsigned int unsigned_arithmetic(unsigned int a, unsigned int b) {
    return (a - b) * a;
}

Open the pre-made environment for the code above.

Both functions generate the same sequence of assembly instructions! The choice of two's complement representation allows the add/sub/imul instruction to work for both unsigned and signed types. Neat!
Edit both functions to add the following line at the start: a >>= b;. This performs a right shift on one of its arguments. When doing a right-shift, does gcc emit an arithmetic (sar) or logical (shr) shift? Does it matter whether the argument is signed or unsigned?
For the shift instructions, the shift amount can be either an immediate value, or the byte register %cl (and only that register!). However, the instruction interprets the contents of the %cl register in a slightly interesting way. Check out slides 36-37 of lecture 11 here. Slide 37 in particular mentions information you need here. Read over this explanation and discuss the included example with your partner. Then, take a look back at the assembly code you have been working with so far, particularly the a >>= b lines. Notice how it uses the %cl register for shifting. What is the largest shift amount that could be specified using the %cl register for an unsigned int? Why might this limit make sense?

Load Effective Address

The lea instruction is meant to help do calculations with memory addresses, but it also has helpful uses for general arithmetic. Take a few minutes to read over the description of mov vs. lea on the x86-64 Guide page. It packs a lot of computation in one instruction: two adds and a multiply by constant 1, 2, 4 or 8, and is often used by compiler to do an efficient add/multiply combo.

int combine(int x, int y) {
    return x + y;
}

Open the pre-made environment for the code above.

In the generated assembly, you'll see a lea instead of the expected add. Interesting! Take a minute to work through what that lea instruction is doing to understand why lea can be used for add in this case. What else can it do? Let's find out!
Edit the combine function to return x + 2 * y, and then return x + 8 * y - 17, and observe how a single lea instruction can also compute these more complex expressions, such as with multiplication!
Edit the function to now be return x + 47 * y and the result will no longer fit the pattern for an lea because of the multiplication factor. What sequence of assembly instructions is generated instead?

Multiply is one of the more expensive instructions and the compiler prefers cheaper alternatives where possible. Open this pre-made environment for the code below:

int scale(int x) {
    return x * 4;
}

The scale function multiplies its argument by 4. Look at its generated assembly-- there is no multiply instruction. What has the compiler used instead? (Note: the add of zero does nothing, but is outputted for some reason by the compiler) Edit the scale function to return x * 16. What assembly instruction is used now?
It is perhaps unsurprising that the compiler treats multiplication by powers of 2 as a special case given an underlying representation in binary, but it's got a few more tricks up its sleeve than just that! Edit the scale function to instead multiply its argument by a small constant that is not a power of 2, e.g. x * 3, x * 7, x * 12, x * 17, .... For each, look at the assembly and see what instructions are used. GCC sure goes out of its way to avoid multiply!
Experiment to find a small integer constant C such that return x * C is expressed as an actual imul instruction.
Note: you may encounter another rarer form of imul with 3 arguments while playing around with multiplication. These arguments are source1, source2, destination, meaning that this form multiplies the sources together and puts the result in the destination. The destination must be a register, the first source a register or memory location, and the second source an immediate value.

Division

Division is something that is also tedious for modern CPUs; for this reason, GCC avoids division when possible. Let's find out what wizardry it is willing to employ.

Open this pre-made environment containing the code below:

unsigned int unsigned_division(unsigned int x) {
    return x / 2;
}

In the generated assembly for unsigned_division, there is no div instruction in sight, which is the instruction to perform division. How then did it implement unsigned division by 2? What is the interpretation of a one operand shr instruction?
Change the function to divide by a different power of 2 instead (e.g. 4 or 8 or 64). What changes in the generated assembly?

Restore the function back to its original return x / 2 and paste in this additional function that does a signed division by 2:

int signed_division(int x) {
        return x / 2;
}

Whereas addition, subtraction, and multiplication operate equivalently on signed and unsigned integers, it's not quite so with division. If the divisor does not evenly divide the dividend, the quotient must be rounded (the terminology is DIVIDEND / DIVISOR = QUOTIENT). In the case of dividing an odd dividend by 2, the remainder is 1. Discarding the remainder (which is what unsigned divide does by shifting away the lsb) has the effect of rounding down to a lesser value. However, the rule for integer division is that quotient must be rounded toward zero, e.g. dividing 3/2 = 1 and dividing -3/2 = -1. A positive quotient is rounded down, but a negative quotient is rounded up.

What is the bit pattern for 3 and then 3 >> 1? What is the bit pattern for -3 and then -3 >> 1? (Assume arithmetic right shift) Do you see why the right shift (discard lsb) will round a positive value toward zero and rounds a negative value away from zero? This difference in rounding is the crux of why right-shift alone is insufficient if the result is negative.
Compare the assembly for unsigned_division to signed_division. The signed version has a pair of instructions (shr, add) inserted and uses sar in place of shr. First, consider that last substitution. If the number being shifted is positive, there is no difference, but arithmetic right shift versus logical on a negative number has a profound impact. What is different? Why?
Now let's dig into the shr and add instructions that were inserted in the signed version. Trace through their operation when the dividend is positive and confirm these instructions made no change in the quotient. But these instructions do have an effect when the dividend is negative. They adjust the negative dividend by a "fixup" amount before the divide (shift). This pushes the dividend to the next larger multiple of the divisor so as to get the proper rounding for a negative quotient.
The necessary fixup amount when dividing by 2 is 1, the fixup for dividing by 4 is 3, for 8 it is 7 and so on. This calculation should be reminiscent of the roundup function we studied way back in lab1! Change the signed_division function to divide by 4 instead. The fixup amount is now 3. The assembly instructions handle the fixup slightly differently than before. It computes the dividend plus fixup and uses a "conditional mov" instruction to select between using the fixed or unmodified dividend based on whether the dividend is negative. This is a new type of mov instruction in addition to the ones we've already seen - we'll discuss this more in a future lecture, but the way to read the test and cmov instructions is that if the number in %edi is not negative, it moves %edi into %eax. This undoes the fixup if the value is non-negative since fixup is not needed in that case. It's cool to see that assembly conditionally adds this fixup of 3 here to ensure the shift produces the correct result!

Optional: further explorations: Division by powers of two get special treatment, sure, but what about constants that are not powers of two?

Open this pre-made environment containing the code below:

unsigned int unsigned_div10(unsigned int x) {
    return x / 10;
}

Look at the C source and generated assembly for this function. The assembly still doesn't use a div instruction, by there is a multiply by a bizarre number: -858993459 (in hex 0xcccccccd). What sorcery is this? This mysterious transformation is effectively multiplying by 1/10 as a substitute for divide by 10. The 1/10 value is being represented as a "fixed point fraction" which is constructed similarly to the way floats are. Enter 0.1 into the float visualizer tool and look to see where that same funny 0xcccccccd number shows up in the significand bits of the float. This technique is known as reciprocal multiplication and gcc will generally convert division by a constant in this way. The math behind this is somewhat complex, but if you'd like to know more, check out Doug Jones' tutorial on reciprocal multiplication.

2) Tools (35 minutes)

Deadlisting with `objdump`

As part of the compilation process, the assembler takes in assembly instructions and encodes them into binary machine form. Disassembly is the reverse process that converts binary-encoded instructions back into human-readable assembly text. objdump is a tool that operates on object files (i.e. files containing machine instructions in binary). It can dig out all sorts of information from the object file, but one of the more common uses is as a disassembler. Let's try it out!

Invoking objdump -d extracts the instructions from an object file and outputs the sequence of binary-encoded machine instructions alongside the assembly equivalent. This dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). Use make to build the lab programs and then objdump -d code to get a sample deadlist.
The ./countops.py python script reports the most heavily used assembly instructions in a given object file. Try out ./countops.py code for an example. The script uses objdump to disassemble the file, tallies instructions by opcode, and reports the top 10 most frequent. Use which to get the path to a system executable (e.g. which emacs) and then use countops.py on that path. Try this for a few executables. Does the mix of assembly instructions seem to vary much by program?

Tip #1: you can save the output of objdump to file so that you can easily view it later by doing objdump -d code > myfile.txt.
Tip #2: remember that all literal values in assembly are in hexadecimal! This is a common gotcha when reading assembly code.

GDB Commands for Live Assembly Debugging

Below we introduce a few of the gdb commands that allow you to work with code at the assembly level.

The gdb command disassemble with no argument will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disassemble, such as a function name or code address.

(gdb) disass myfn
Dump of assembler code for function myfn:
   0x000000000040051b <+0>: movl   $0x1,-0x20(%rsp)
   0x0000000000400523 <+8>: movl   $0x2,-0x1c(%rsp)
...

The hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function.

You can set a breakpoint at a specific assembly instruction by specifying its address b *address or an offset within a function b *myfn+8. Note that the latter is not 8 instructions into main, but 8 bytes worth of instructions into main. Given the variable-length encoding of instructions, 7 bytes can correspond to one or several instructions. If you try to set a breakpoint on an address without using the * prefix, gdb will try to interpret the address as a function name and not the address of a particular assembly instruction, so be careful!

(gdb) b *0x400570      break at specified address
(gdb) b *myfn+8        break at instruction 8 bytes into myfn()

The gdb commands stepi and nexti allow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-level step and next commands. They can be abbreviated si and ni.

(gdb) stepi            executes next single instruction
(gdb) nexti            executes next instruction (step over fn calls)

The gdb command info reg will print all integer registers. You can print or set a register's value by name. Within gdb, a register name is prefixed with $ instead of the usual %.

(gdb) info reg
rax            0x4005c1 4195777
rbx            0x0  0
....
(gdb) p $rax            show current value in %rax register
(gdb) set $rax = 9      change current value in %rax register

The tui (text user interface) splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb command layout <argument> starts tui mode. The argument specifies which pane you want (src, asm, regs, split or next).

(gdb) layout split

Tui mode is great for tracing execution and observing what is happening with code/registers as you stepi. Occasionally, tui trips over itself and garbles the display. The gdb command refresh sometimes works to clean it up, or you can try ctrl-l. If things get really out of hand, ctrl-x a will exit tui mode and return you to ordinary non-graphical gdb.

Reading and Tracing Assembly in GDB

Let's try out all these new gdb commands!

Read over the program code.c.
Compile the program and load it into gdb. Disassemble myfn.
Use the disassembly to figure out where arr is being stored (Hint: take a look at the first mov instructions - %rsp is a pointer to the current top of the stack). How are the values in arr initialized? What happened to the strlen call on the string constant to init the last array element?
What instructions were emitted to compute the value assigned to count? What does this tell you about the sizeof operator? Is it an actual function executed by the assembly instructions?
Set a breakpoint at myfn and run the program. Use the gdb command info locals to show the local variables. Compare this list to the declarations in the C source. You'll see some variables are shown with values ("live"), some are <optimized out>, but others don't show up at all. Look at the disassembly to figure out what happened to these entirely missing variables. Step through the function repeating the info locals command to observe which variables are live at each step. Examine the disassembly to explain why there is no step at which both total and squared are live. As a hint: where do the different variables live at the assembly level? What about ones with constant values?

Check off with TA

At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission so you are properly credited for your work. It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab5 should be understanding tools for disassembling object files, and using the debugger at the assembly level. The hands-on practice relating C code to its compiled assembly and reverse-engineering from assembly to C is great preparation for your next assignment. Here are some questions to verify your understanding and get you thinking further about these concepts:

Give an example of a C sequence/expression that could be compiled into a variety of different, but equivalent, assembly instruction(s).
What is the difference between sar and shr?
How is the return value of a function communicated from the callee to the caller?
What kind of clues might suggest that an lea instruction is doing an address calculation versus when it is merely arithmetic?
How can you get the value of a variable during execution that gdb reports has been <optimized out>?