Lab 5: Assembly Solutions

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Go back to this week's lab writeup (without solutions) here.

Pre-Lab Exercises

Before attending your lab, please work through the "Pre-lab Exercise" section below. We plan for the exercise to take 20-30min maximum. Feel free to post on the discussion forum if you have questions while working!

Learning Goals

This lab is designed to give you a chance to:

study the relationship between C source and its assembly translation
use objdump and gdb to disassemble and trace assembly code

Once you have gotten set up with your group, share with them anything fun you did over the weekend!

Pre-Lab Exercise: Assembly Tools

Clone the repo by using the command below to create a lab5 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab5/shared lab5

These files are only needed for the prelab, and not for the main lab.

(a, read) Deadlisting with `objdump`

As part of the compilation process, the assembler takes in assembly instructions and encodes them into binary machine form. Disassembly is the reverse process that converts binary-encoded instructions back into human-readable assembly text. objdump is a tool that operates on object files (i.e. files containing machine instructions in binary). It can dig out all sorts of information from the object file, but one of the more common uses is as a disassembler. Let's try it out!

Invoking objdump -d extracts the instructions from an object file and outputs the sequence of binary-encoded machine instructions alongside the assembly equivalent. This dump is called a deadlist ("dead" to distinguish from the study of "live" assembly as it executes). Use make to build the lab programs and then objdump -d code to get a sample deadlist.

Tip #1: you can save the output of objdump to file so that you can easily view it later by doing objdump -d code > myfile.txt.
Tip #2: remember that all literal values in assembly on myth are in hexadecimal! This is a common gotcha when reading assembly code. (Compiler Explorer may convert to decimal, though)

(b, read) GDB Commands for Live Assembly Debugging

Below we introduce a few of the gdb commands that allow you to work with code at the assembly level.

The gdb command disassemble with no argument will print the disassembled instructions for the currently executing function. You can also give an optional argument of what to disassemble, such as a function name or code address.

(gdb) disass myfn
Dump of assembler code for function myfn:
   0x000000000040051b <+0>: movl   $0x1,-0x20(%rsp)
   0x0000000000400523 <+8>: movl   $0x2,-0x1c(%rsp)
...

The hex number in the leftmost column is the address in memory for that instruction and in angle brackets is the offset of that instruction relative to the start of the function.

You can set a breakpoint at a specific assembly instruction by specifying its address b *address or an offset within a function b *myfn+8. Note that the latter is not 8 instructions into main, but 8 bytes worth of instructions into main. Given the variable-length encoding of instructions, 7 bytes can correspond to one or several instructions. If you try to set a breakpoint on an address without using the * prefix, gdb will try to interpret the address as a function name and not the address of a particular assembly instruction, so be careful!

(gdb) b *0x400570      break at specified address
(gdb) b *myfn+8        break at instruction 8 bytes into myfn()

The gdb commands stepi and nexti allow you to single-step through assembly instructions. These are the assembly-level equivalents of the source-level step and next commands. They can be abbreviated si and ni.

(gdb) stepi            executes next single instruction
(gdb) nexti            executes next instruction (step over fn calls)

The gdb command info reg will print all integer registers. You can print or set a register's value by name. Within gdb, a register name is prefixed with $ instead of the usual %.

(gdb) info reg
rax            0x4005c1 4195777
rbx            0x0  0
....
(gdb) p $rax            show current value in %rax register
(gdb) set $rax = 9      change current value in %rax register

The tui (text user interface) splits your session into panes for simultaneously viewing the C source, assembly translation, and/or current register state. The gdb command layout <argument> starts tui mode. The argument specifies which pane you want (src, asm, regs, split or next).

(gdb) layout split

Tui mode is great for tracing execution and observing what is happening with code/registers as you stepi. Occasionally, tui trips over itself and garbles the display. The gdb command refresh sometimes works to clean it up, or you can try ctrl-l. If things get really out of hand, ctrl-x a will exit tui mode and return you to ordinary non-graphical gdb.

(c, activity) Reading and Tracing Assembly in GDB

Let's try out all these new gdb commands!

Read over the program code.c.
Compile the program and load it into gdb. Disassemble myfn.
Use the disassembly to figure out where arr is being stored (Hint: take a look at the first mov instructions - %rsp is a pointer to the current top of the stack). Q: How are the values in arr initialized? What happened to the strlen call on the string constant to init the last array element?
Array is stored on stack and initialized via a sequence of mov instructions. The strlen call is optimized out: the compiler just puts in "13". Thus, no actual strlen code will run at runtime.
Q: What instruction was emitted to compute the value assigned to count? (hint: look at the end of the assembly) What does this tell you about the sizeof operator? Is the result resolved when the program is compiled, or when the program is run?
The sizeof "call" is replaced by a constant. sizeof is always resolved by the compiler. It is a Compile Time operator, not a function, and there is no code associated with it at runtime.
Q: In the assembly, what register does it look like stores total? How about squared? How can that be? As a hint: registers can be reused for multiple variables!
Both total and squared are stored in %eax! They share the same register.

Lab Exercises

To study C->assembly translation, we recommend you try out Compiler Explorer, a nifty online tool that provides an "interactive compiler". This link and the pre-provided links throughout this lab are pre-configured for myth's version of GCC and compiler flags from the CS107 makefiles. (To manually configure yourself: select language C, compiler version x86-64 gcc 5.4 and enter flags -Og -std=gnu99).

In Compiler Explorer, you can enter a C function, see its generated assembly, then tweak the C source and observe how those changes are reflected in the assembly instructions. You could instead make these same observations on myth using gcc and gdb, but Compiler Explorer makes it easier to do those tasks in a convenient environment for exploration. Try it out!

Also check out a reference for these commands here while you work.

1) Addressing Modes (15 minutes)

(1a) From Pointer Dereferencing to `mov`

Review the two deref functions below.

void deref_one(char *ptr, long index) {
    *ptr = '\0';
}

void deref_two(int *ptr, long index) {
    *ptr = 0;
}

Open Compiler Explorer - we've created a pre-made environment already loaded with this code here. Review the generated assembly shown in the right pane.

There is one difference between the assembly instruction sequences of the two functions. Q: What is the difference? Why it is different?
The mov instruction changed from a movb to a movl because the number of bytes to store changes from 1 to 4 (b means byte and l means 4 bytes)
Edit both functions to instead assign ptr[7] to their respective values. Q: How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
The addressing mode changes from indirect to indirect w/ displacement to account for indexing further. The scale factor is in terms of bytes, depending on the type of the pointer being used. deref_one has a char * so it's 1 x 7, while deref_two has an int * so it's 4 x 7. Both change identically otherwise, just the scale factor is different.
Edit both functions to now assign ptr[index] to their respective values. Q: How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?
The addressing mode changes to indirect w/ displacement for the first one, and indirect w/ displacement and scaled index for the second. This is because in both we are using index, which is the second parameter, to scale. The second version must also scale by a multiplier of 4 for the size of an int (the first example scales by a multiplier of 1, so can omit this). Both change identically otherwise, just the scaling is different.
Change the assignment statement to ptr[0] = ptr[1]. For both functions, the assembly sequence is one instruction longer. Previously, only one mov instruction was needed. This assignment statement requires two movs. Q: What special restriction on the mov instruction makes this necessary? (Extra: with this change, you may wonder why the compiler outputs movzbl when it uses only one byte of what is moved. In this case it would likely be functionally equivalent to just move the one byte needed, but the compiler makes choices that it thinks make program execution more efficient. For example, this may help with something called instruction-level parallelism)
The addressing mode changes to indirect w/ displacement for the first one, and indirect w/ displacement and scaled index for the second. This is because in both we are using index, which is the second parameter, to scale. The second version must also scale by a multiplier of 4 for the size of an int (the first example scales by a multiplier of 1, so can omit this). Both change identically otherwise, just the scaling is different.

(1b) Struct Dereferencing in Assembly

Now open the pre-made environment available here for the code below:

typedef struct coord {
    int x;
    int y;
} coord;

void deref_three(struct coord *ptr) {
    ptr->x = ptr->y;
}

void deref_two(int *ptr, long index) {
    ptr[0] = ptr[1];
}

The assembly for deref_three is identical to deref_two, but the C source for the two functions seem to have nothing to do with one another! Q: How is possible that both can generate the same assembly instructions?
Both are doing a similar transfer operation - assembly knows nothing about types! Both are locating addresses 4 bytes from a base, copying 4 bytes from there, and using them to overwrite 4 bytes at base.

2) Arithmetic (20 minutes)

(2a) Signed/Unsigned Arithmetic

Let's explore a bit with arithmetic in assembly.

The add, subl and imul instructions perform addition, subtraction and multiplication, respectively. Here is the format for these instructions:

    add src, dst       # dst += src
    sub src, dst       # dst -= src
    imul src, dst      # dst *= src

The following two functions perform the same arithmetic operation on their arguments, but those arguments differ in type (signedness).

int signed_arithmetic(int a, int b) {
    return (a - b) * a;
}

unsigned int unsigned_arithmetic(unsigned int a, unsigned int b) {
    return (a - b) * a;
}

Open the pre-made environment for the code above.

Both functions generate the same sequence of assembly instructions! The choice of two's complement representation allows the add/sub/imul instruction to work for both unsigned and signed types. Neat!
Edit both functions to add the following line at the start: a >>= b;. This performs a right shift on one of its arguments. Q: When doing a right-shift, does gcc emit an arithmetic (sar) or logical (shr) shift? Does it matter whether the argument is signed or unsigned?
The arithmetic shift comes up with the signed types, and the logical shift is used for the unsigned types.
For the shift instructions, the shift amount can be either an immediate value, or the byte register %cl (and only that register!). Q: How does the shift instruction interpret the contents of the %cl register? Check out slides 36-37 (specifically, slide 37) of lecture 11 here, and note how the a >>= b lines. Notice how it uses the %cl register for shifting. What is the largest shift amount that could be specified using the %cl register for an unsigned int? Why might this limit make sense?
The shift will only look at the low-order log2(w) bits of %cl to know how much to shift
Take a look back at the assembly code you have been working with so far, particularly the a >>= b lines you inserted. Notice how it uses the %cl register for shifting. Q: What is the largest shift amount that could be specified using the %cl register for an unsigned int? Why might this limit make sense?
For an unsigned int, this means %cl could specify a shift amount of at most 31 (the max number that fits in log2(32) = 5 bits). This makes sense as a limit because it is undefined to shift an unsigned int more than 31 places.

(2b) Load Effective Address

The lea instruction is meant to help do calculations with memory addresses, but it also has helpful uses for general arithmetic. Take a few minutes to read over the description of mov vs. lea on the x86-64 Guide page. It packs a lot of computation in one instruction: two adds and a multiply by constant 1, 2, 4 or 8, and is often used by compiler to do an efficient add/multiply combo.

int combine(int x, int y) {
    return x + y;
}

Open the pre-made environment for the code above.

In the generated assembly, you'll see a lea instead of the expected add. Interesting! Take a minute to work through what that lea instruction is doing to understand why lea can be used for add in this case. What else can it do? Let's find out!
Edit the combine function to return x + 2 * y, and then return x + 8 * y - 17, and observe how a single lea instruction can also compute these more complex expressions, such as with multiplication!
Edit the function to now be return x + 47 * y and the result will no longer fit the pattern for an lea because of the multiplication factor. Q: What sequence of assembly instructions is generated instead?
imull and addl instructions are used, because 47 does not fit as a valid scale multiplier for lea.

(2c) Multiplication

Multiply is one of the more expensive instructions and the compiler prefers cheaper alternatives where possible. Open this pre-made environment for the code below:

int scale(int x) {
    return x * 4;
}

The scale function multiplies its argument by 4. Look at its generated assembly-- there is no multiply instruction. Q: What has the compiler used instead? (Note: the add of zero does nothing, but is outputted for some reason by the compiler)
It uses leal, but without the "base" value. This is because all we are interested in is the scaling of x by 4. We can accomplish a multiply by 16 by doing a left shift by 4 (x << 4).
Edit the scale function to return x * 16. What assembly instruction is used now?
It is perhaps unsurprising that the compiler treats multiplication by powers of 2 as a special case given an underlying representation in binary, but it's got a few more tricks up its sleeve than just that! Edit the scale function to instead multiply its argument by a small constant that is not a power of 2, e.g. x * 3, x * 7, x * 12, x * 17, .... For each, look at the assembly and see what instructions are used. GCC sure goes out of its way to avoid multiply!
For 3, we leverage the fact that x + x * 2 == 3 * x within a call to lea.
For 17, we leverage the left shift and addition of itself to the left shifted result by 4 (i.e. x << 4 + x = x * 17).
For 25, we use the fact that temp = x + x * 4; temp + temp * 4 == x * 25 (pretty awesome!)
Note: you may encounter another rarer form of imul with 3 arguments while playing around with multiplication. These arguments are source1, source2, destination, meaning that this form multiplies the sources together and puts the result in the destination. The destination must be a register, the first source a register or memory location, and the second source an immediate value.

3) Division (15 minutes)

(3a) Unsigned Division

Division is something that is also tedious for modern CPUs; for this reason, GCC avoids division when possible. Let's find out what wizardry it is willing to employ.

Open this pre-made environment containing the code below:

unsigned int unsigned_division(unsigned int x) {
    return x / 2;
}

In the generated assembly for unsigned_division, there is no div instruction in sight, which is the instruction to perform division. Q: How then did it implement unsigned division by 2? What is the interpretation of a one operand shr instruction?
It uses shr (right shift), and omits the first operand because it is 1.
Change the function to divide by a different power of 2 instead (e.g. 4 or 8 or 64). Q: What changes in the generated assembly?
The shift amount operand is added since we are shifting by more than 1.

(3b) Signed Division

Restore the function back to its original return x / 2 and paste in this additional function that does a signed division by 2:

int signed_division(int x) {
        return x / 2;
}

Whereas addition, subtraction, and multiplication operate equivalently on signed and unsigned integers, it's not quite so with division. If the divisor does not evenly divide the dividend, the quotient must be rounded (the terminology is DIVIDEND / DIVISOR = QUOTIENT). In the case of dividing an odd dividend by 2, the remainder is 1. Discarding the remainder (which is what unsigned divide does by shifting away the lsb) has the effect of rounding down to a lesser value. However, the rule for integer division is that quotient must be rounded toward zero, e.g. dividing 3/2 = 1 and dividing -3/2 = -1. A positive quotient is rounded down, but a negative quotient is rounded up.

Q1: What is the bit pattern for 3 and then 3 >> 1? Q2: What is the bit pattern for -3 and then -3 >> 1? (Assume arithmetic right shift) Do you see why the right shift (discard lsb) will round a positive value toward zero and rounds a negative value away from zero? This difference in rounding is the crux of why right-shift alone is insufficient if the result is negative.
Q1: 3 is 0b11, 3 >> 1 is 0b1, which is 1. Q2: -3 is 0b111...1101, -3 >> 1 is 0b111...110, which is -2!
Compare the assembly for unsigned_division to signed_division. The signed version has a pair of instructions (shr, add) inserted and uses sar in place of shr. First, consider that last substitution. If the number being shifted is positive, there is no difference, but arithmetic right shift versus logical on a negative number has a profound impact. Q: What is different? Why?
We need to do an arithmetic shift because the sign bit needs to be preserved.
Now let's dig into the shr and add instructions that were inserted in the signed version.
- Trace through their operation when the dividend is positive and confirm these instructions made no change in the quotient.
- However, these instructions do have an effect when the dividend is negative. They adjust the negative dividend by a "fixup" amount before the divide (shift). This pushes the dividend to the next larger multiple of the divisor so as to get the proper rounding for a negative quotient.
The necessary fixup amount when dividing by 2 is 1, the fixup for dividing by 4 is 3, for 8 it is 7 and so on. This calculation should be reminiscent of the roundup function we studied way back in lab1!
- Change the signed_division function to divide by 4 instead. The fixup amount is now 3.
- The assembly instructions handle the fixup slightly differently than before. It computes the dividend plus fixup and uses a "conditional mov" instruction to select between using the fixed or unmodified dividend based on whether the dividend is negative.
- This is a new type of mov instruction in addition to the ones we've already seen - we'll discuss this more in a future lecture, but the way to read the test and cmov instructions is that if the number in %edi is not negative, it moves %edi into %eax. This undoes the fixup if the value is non-negative since fixup is not needed in that case.
- It's cool to see that assembly conditionally adds this fixup of 3 here to ensure the shift produces the correct result!
  cmovns S, R is like mov S, R but it only copies S to R if the signed flag is 0. Therefore, in this case, the mov will only happen if %edi is non-negative because of test %edi %edi. We'll cover exactly how x86-64 detects signed-ness soon, so stay tuned.

Optional: further explorations: Division by powers of two get special treatment, sure, but what about constants that are not powers of two?

Open this pre-made environment containing the code below:

unsigned int unsigned_div10(unsigned int x) {
    return x / 10;
}

Look at the C source and generated assembly for this function. The assembly still doesn't use a div instruction, by there is a multiply by a bizarre number: -858993459 (in hex 0xcccccccd). What sorcery is this? This mysterious transformation is effectively multiplying by 1/10 as a substitute for divide by 10. The 1/10 value is being represented as a "fixed point fraction" which is constructed similarly to the way floats are. Enter 0.1 into the float visualizer tool and look to see where that same funny 0xcccccccd number shows up in the significand bits of the float. This technique is known as reciprocal multiplication and gcc will generally convert division by a constant in this way. The math behind this is somewhat complex, but if you'd like to know more, check out Doug Jones' tutorial on reciprocal multiplication.

Recap

Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab5 should be understanding the relationship between C and assembly, along with tools for disassembling object files, and using the debugger at the assembly level. The hands-on practice relating C code to its compiled assembly and reverse-engineering from assembly to C is great preparation for your next assignment. Here are some questions to verify your understanding and get you thinking further about these concepts:

Give an example of a C sequence/expression that could be compiled into a variety of different, but equivalent, assembly instruction(s).
x /= 2 (shift, division, subtraction, etc.)
What is the difference between sar and shr?
sar is arithmetic shift, shr is logical shift.
How is the return value of a function communicated from the callee to the caller?
It is put in the %rax register.
What kind of clues might suggest that an lea instruction is doing an address calculation versus when it is merely arithmetic?
Treating the registers used in the computation as addresses, e.g. going to that memory location, adding hops, etc.

Pre-Lab Exercises

Learning Goals

Pre-Lab Exercise: Assembly Tools

(a, read) Deadlisting with objdump

(b, read) GDB Commands for Live Assembly Debugging

(c, activity) Reading and Tracing Assembly in GDB

Lab Exercises

1) Addressing Modes (15 minutes)

(1a) From Pointer Dereferencing to mov

(1b) Struct Dereferencing in Assembly

2) Arithmetic (20 minutes)

(2a) Signed/Unsigned Arithmetic

(2b) Load Effective Address

(2c) Multiplication

3) Division (15 minutes)

(3a) Unsigned Division

(3b) Signed Division

Recap

(a, read) Deadlisting with `objdump`

(1a) From Pointer Dereferencing to `mov`