Lab 6: Assembly

Lab sessions Tue May 15 to Sat May 19

Lab written by Julie Zelenski

Learning goals

This lab is designed to give you a chance to:

  1. study the relationship between C source and its assembly translation
  2. use objdump and gdb to disassemble and trace assembly code

Find an open computer and somebody new to sit with. Share your joy, angst, and relief from last week's midterm.

Get started

Clone the repo by using the command below to create a lab6 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab6/shared lab6

Open the lab checkoff form.

Lab exercises

To study C->assembly translation, we recommend you try out Compiler Explorer, a nifty online tool that provides an "interactive compiler". This link https://godbolt.org/g/J8wSw5 is pre-configured for myth's version of GCC and compiler flags from the CS107 makefiles. (To manually configure yourself: select language C,compiler version x86-64 gcc 5.4 and enter flags -Og -std=gnu99).

In Compiler Explorer, you can enter a C function, see its generated assembly, then tweak the C source and observe how those changes are reflected in the assembly instructions. You could instead make these same observations on myth using gcc and gdb, but Compiler Explorer makes it easier to do those tasks in a convenient environment for exploration. Try it out!

A note on register use: From Monday's lecture, remember which registers are used for passing values in and out of functions: the first three arguments are passed in %rdi, %rsi, and %rdx respectively and the return value is written to %rax. A reference to the e-named register accesses the lower 32-bit sub-register, e.g. %edi is the subregister of %rdi. Operations on pointers and longs (64-bit types) will use the full r-named registers, and operations on ints will use the e-named subregisters.

1) Assembly code study (70 minutes)

Addressing modes

Review the two deref functions below.

void deref_one(char *ptr, long index) 
{
    *ptr = '\0';
}

void deref_two(int *ptr, long index) 
{
    *ptr = 0;
}

Open Compiler Explorer https://godbolt.org/g/J8wSw5. Copy the above two functions and paste them into the left pane and review the generated assembly shown in the right pane.

  • There is one difference between the assembly instruction sequences of the two functions. What is the difference? Why it is different?
  • Edit both functions to instead assign ptr[7] to zero. How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
  • Edit both functions to now assign ptr[index] to zero. How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
  • Change the assignment statement to ptr[0] = ptr[1]. For both functions, the assembly sequence is one instruction longer. Previously, only one mov instruction was needed. This assignment statement requires two movs. Why?

Now add the deref_three function below to Compiler Explorer:

struct coord { int x, y; };

void deref_three(struct coord *ptr)
{
    ptr->x = ptr->y;
}
  • The assembly for deref_three is identical to deref_two, but the C source for the two functions seem to have nothing to do with one another! How is possible that both can generate the same assembly instructions?

Signed/unsigned arithmetic

The following two functions perform the same arithmetic operation on their arguments, but those arguments differ in type (signedness).

int s_arith(int a, int b) 
{
    return (a - b) * a;
}

unsigned int u_arith(unsigned int a, unsigned int b) 
{
    return (a - b) * a;
}

Copy and paste the above two functions into Compiler Explorer.

  1. Both functions generate the same sequence of assembly instructions! The choice of two's complement representation allows the add/sub/imul instruction to work for both unsigned and signed types. Neat!
  2. Edit both functions to perform a right shift on one of its arguments. When doing a right-shift, does gcc emit an arithmetic (sar) or logical (shr) shift? Does it matter whether the argument is signed or unsigned?

Load effective address

The lea instruction packs a lot of computation in one instruction: two adds and a multiply by constant 1, 2, 4 or 8. It was designed for address arithmetic, but the math is compatible with regular integer operations and it is often used by compiler to do an efficient add/multiply combo.

int combine(int x, int y) 
{
    return x + y;
}

Copy and paste the combine function into Compiler Explorer.

  • In the generated assembly, you'll see a lea instead of the expected add. Interesting! lea can be used for add, but what else can it do?
  • Edit the combine function to return x + 2*y or return x + 8*y -17 and observe how a single lea instruction can also compute these more complex expressions.
  • Edit to return x + 47*y and the result will no longer fit the pattern for an lea. What sequence of assembly instructions is generated instead?

Multiply is one of the more expensive instructions and the compiler prefers cheaper alternatives where possible. Copy and paste this function into Compiler Explore:

int scale(int x)
{
    return x * 4;
}
  • The scale function multiplies its argument by 4. Look at its generated assembly-- there is no multiply instruction. What has the compiler used instead? Edit the scale function to return x*16. What assembly instruction is used now?
  • It is perhaps unsurprising that the compiler treats multiplication by powers of 2 as a special case given an underlying representation in binary, but it's got a few more tricks up its sleeve than just that! Edit the scale function to instead multiply its argument by a small constant that is not a power of 2,e.g. x*3, x*7, x*12, x*17, .... For each, look at the assembly and see what instructions are used. Gcc sure goes out of its way to avoid multiply!
  • Experiment to find a small integer constant C such that return x*C is expressed as an actual imul instruction.

Division

A universal complaint of third graders is that doing long division is tedious; modern CPUs concur. The hard work of division is something the compiler is eager to avoid. Let's find out what wizardry it is willing to employ.

Copy this division function into Compiler Explorer:

unsigned int u_div(unsigned int x)
{
    return x / 2;
}
  • In the generated assembly for u_div, is no div instruction in sight. How then did it implement unsigned division by 2? What is the interpretation of one operand shr instruction?
  • Change the function to divide by a different power of 2 instead (e.g. 4 or 8 or 64). What changes in the generated assembly?

Restore u_div back to its original return x / 2 and paste in this additional function that does a signed division by 2:

int s_div(int x)
{
    return x / 2;
}

Whereas addition, subtraction, and multiplication operate equivalently on signed and unsigned integers, it's not quite so with division. If the divisor does not evenly divide the dividend, the quotient must be rounded. In the case of dividing an odd dividend by 2, the remainder is 1. Discarding the remainder (which is what unsigned divide does by shifting away the lsb) has the effect of rounding down to a lesser value. However, the rule for integer division is that quotient must be rounded toward zero, e.g. dividing 3/2 = 1 and dividing -3/2 = -1. A positive quotient is rounded down, but a negative quotient is rounded up.

  • What is the bit pattern for 3 and then 3 >> 1? What is the bit pattern for -3 and then -3 >> 1? (Assume arithmetic right shift) Do you see why the right shift (discard lsb) will round a positive value toward zero and rounds a negative value away from zero? This difference in rounding is the crux of why right-shift alone is insufficient if the result is negative.
  • Compare the assembly for u_div to s_div. The signed version has a pair of instructions (shr, add) inserted and uses sar in place of shr. First consider that last substitution. If the number being shifted is positive there is no difference, but arithmetic right shift versus logical on a a negative number has a profound impact. What is different? Why?
  • Now let's dig into the shr and add instructions that were inserted in the signed version. Trace through their operation when the dividend is positive and confirm these instructions made no change in the quotient. But these instructions do have an effect when the dividend is negative. They adjust the negative dividend by a "fixup" amount before the divide (shift). This pushes the dividend to the next larger multiple of the divisor so as to get the proper rounding for a negative quotient.
  • The necessary fixup amount when dividing by 2 is 1, the fixup for dividing by 4 is 3, for 8 it is 7 and so on. This calculation should be reminiscent of the roundup function we studied way back in lab1! Change the s_div function to divide by 4 instead. The fixup amount is now 3. The assembly instructions handle the fixup slightly differently than before. It computes the dividend plus fixup and uses a "conditional mov" instruction to select between using the fixed or unmodified dividend based on whether the dividend is negative. Trace through the assembly to see how this is done.
  • There is a family of conditional move instructions that parallel the branch instructions (jne -> cmovne, jle -> cmovle, and so on). The branch reads the condition codes to decide whether to take the branch or not, a conditional mov reads the condition codes to decide whether to complete the mov or ignore it.

Option further explorations for the curious: Division by powers of two get special treatment, sure, but what about constants that are not powers of two?

Add the u_div_10 function below to Compiler Explorer:

unsigned int u_div10(unsigned int x)
{
    return x/10;
}

Look at the C source and generated assembly for udiv_by_ten. The assembly still doesn't use a div instruction, by there is a multiply by a bizarre number: -858993459 (in hex 0xcccccccd). What sorcery is this? This mysterious transformation is effectively multiplying by 1/10 as a substitute for divide by 10. The 1/10 value is being represented as a "fixed point fraction" which is constructed similarly to the way floats are. Enter 0.1 into the float visualizer tool and look to see where that same funny 0xcccccccd number shows up in the significand bits of the float. This technique is known as reciprocal multiplication and gcc will generally convert division by a constant in this way. The math behind this is somewhat complex and we'll won't make a PhD thesis out of it here, but if you'd like to know more, check out Doug Jones' tutorial on reciprocal multiplication.

2) Tools (40 minutes)

Deadlisting with objdump

As part of the compilation process, the assembler takes in assembly instructions and encodes them into binary machine form. Disassembly is the reverse process that converts binary-encoded instructions back into human-readable assembly text. objdump is a tool that operates on object files (i.e. files containing machine instructions in binary). It can dig out all sorts of information from the object file, but one of the more common uses is as a disassembler. Let's try it out!

  1. Invoking objdump -d extracts the instructions from an object file and outputs the sequence of binary-encoded machine instructions alongside the assembly equivalent. This dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). Use make to build the lab programs and then objdump -d code to get a sample deadlist.
  2. The ./countops.py python script reports the most heavily used assembly instructions in a given object file. Try out countops.py code for an example. The script uses objdump to disassemble the file, tallies instructions by opcode, and reports the top 10 most frequent. Use which to get the path to a system executable (e.g. which vim or emacs or gcc) and then use countops.py on that path. Try this for a few executables. Does the mix of assembly instructions seem to vary much by program?

GDB commands for live assembly debugging

Below we introduce a few of the gdb commands that allow you to work with code at the assembly level.

The gdb command disassemble with no argument will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disassemble, a function name or code address.

(gdb) disass myfn
Dump of assembler code for function myfn:
   0x000000000040051b <+0>: movl   $0x1,-0x20(%rsp)
   0x0000000000400523 <+8>: movl   $0x2,-0x1c(%rsp)
...

The hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function.

You can set a breakpoint at a specific assembly instruction by specifying its address b *address or an offset within a function b * myfn+8. Note that the latter is not 8 instructions into main, but 8 bytes worth of instructions into main. Given the variable-length encoding of instructions, 7 bytes can correspond to one or several instructions.

(gdb) b *0x400570      break at specified address
(gdb) b *myfn+8        break at instruction 8 bytes into myfn()

The gdb commands stepi and nexti allow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-level step and next commands. They can be abbreviated si and ni.

(gdb) stepi            executes next single instruction
(gdb) nexti            executes next instruction (step over fn calls)

The gdb command info reg will print all integer registers. You can print or set a register's value by name. Within gdb, a register name is prefixed with $ instead of the usual %.

(gdb) info reg
rax            0x4005c1 4195777
rbx            0x0  0
....
(gdb) p $rax            show current value in %rax register
(gdb) set $rax = 9      change current value in %rax register

The tui (text user interface) demoed in lecture splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb command layout <argument> starts tui mode. The argument specifies which pane you want (src, asm, regs, split or next).

(gdb) layout split

Tui mode is great for tracing execution and observing what is happening with code/registers as you stepi. Occasionally, tui trips itself and garbles the display. The gdb command refresh sometimes works to clean it up, or you can try ctrl-l. If things get really out of hand, ctrl-x a will exit tui mode and return you to ordinary non-graphical gdb.

Reading and tracing assembly in GDB

Let's try out all these new gdb commands! Read over the program code.c. Compile the program and load into gdb.

  1. Compile the program and load into gdb. Disassemble myfn.
  2. Use the disassembly to figure out where arr is being stored. How are the values in arr initialized? What happened to the strlen call on the string constant to init the last array element?
  3. What instructions were emitted to compute the value assigned to count? What does this tell you about the sizeof operator?
  4. Set a breakpoint at myfn and run the program. Use the gdb command info locals to show the local variables. Compare this list to the declarations in the C source. You'll see some variables are shown with values ("live"), some are <optimized out>, but others don't show up at all. Look at the disassembly to figure out what happened to these entirely missing variables. Step through the function repeating the info locals command to observe which variables are live at each step. Examine the disassembly to explain why there is no step at which both total and squared are live.

Check off with TA

Before you leave, be sure to submit your checkoff sheet and have your lab TA come by and confirm so you will be properly credited. If you don't finish everything before lab is over, we strongly encourage you to finish the remainder on your own. Double-check your progress with self check.