Lab 5 Solutions

Written by Nick Troccoli and Lisa Yan, based on walkthroughs by John Louie and Nate Hardison

Lecture Review

Here are key concepts relevant to this week's lab:

assembly is a more primitive way of building a program. We are getting more practice thinking in terms of "registers" (variable boxes used as temporary storage) and moving data into and out of registers to perform operations on it with the processor.
assembly can be read top to bottom, just like C, and consists of 1 command per line. Each command has optional "arguments" (operands).
there are different ways to specify memory locations you would like to use as arguments. The x86-64 cheat sheet on the course website shows all the different forms. Here is a quick overview of the different x86 addressing modes:

Direct: Fixed/static address (e.g. string literal address in r/o memory)

Indirect: Register holds value of address

Indirect w/ displacement: Same as above, but there is a number prefix that determines the number of bytes to offset the address (i.e. think of pointer arithmetic)

Indirect w/ displacement and scaled-index: This is basically a more complicated version of the displacement; the idea is the same except now we can specify a value to scale by (e.g. 1, 2, 4, 8)

The mov instruction is responsible for moving data around. You can move data from an immediate (constant)/register/memory location to a register/memory location. The exception is that you cannot move data from a memory location to a memory location - you must move it into a register first.
some instructions, like mov, specify the number of bytes they are moving, shifting, etc. They use b for 1 byte, w for "word" (2 bytes), l for "double word" (4 bytes) and q for "quad word" (8 bytes).
some registers have special responsibilities: the first three arguments are passed in %rdi, %rsi, and %rdx, respectively and the return value is written to %rax. Complete details are on the reference sheet on the website.
registers can be partially or fully referred to. For instance, you can access just the lower 32 bits of a register instead of the full 64. Specifically, a reference to the e-named register accesses the lower 32-bit sub-register, e.g. %edi is the subregister of %rdi. Operations on pointers and longs (64-bit types) will use the full r-named registers, and operations on ints will use the e-named subregisters.
lea is "load effective address"; it uses similar syntax to mov, except it is meant to perform simple arithmetic operations instead of copying data around. Instead of going to the address calculated by the source, it just moves the resulting source value into the destination. In this way, it doesn't "dereference" the address like mov does. Check out the lecture slides for several constrasting examples of mov vs. lea.
There are several different instructions for different types of arithmetic and logical operations such as bit shifting, adding, subtracting, multiplying, etc.
Multiplication is a special instruction because of cases where multiplying two 64-bit values can produce a 128-bit result. For this reason, you can use multiplication where you specify one argument, it multiplies it by the number stored in %rax, and it splits the 128-bit result across %rdx (higher 64 bits) and %rax (lower 64 bits). The instruction is hardcoded to use these registers.
Division is a special instruction because of cases where you divide a 128-bit value. For this reason, you use division where you specify one argument, it divides the value split across %rdx (higher 64 bits) and %rax (lower 64 bits) by the argument, and it stores the quotient (result) in %rax, and the remainder in %rdx. The instruction is hardcoded to use these registers. You must specify what you are dividing (dividend) across both %rdx and %rax even if the value is only 64 bits, so the cqto instruction is commonly used to sign-extend what's in %rax into %rdx.

Solutions

1) Assembly and GDB

Here is the code for code.c's myfn and an annotated version (comments added along the right-hand side) of its assembly:

int myfn(long param) {
    int arr[] = {1, 2, 3, param, 0x4567, INT_MIN, strlen("Stanford University")};
    int count = sizeof(arr) / sizeof(arr[0]);

    int total = arr[param];
    int squared = total * total;
    return squared + count;
}

<+0>:     movl   $0x1,-0x20(%rsp)         # put arr[0] on stack
<+8>:     movl   $0x2,-0x1c(%rsp)         # put arr[1] on stack
<+16>:    movl   $0x3,-0x18(%rsp)         # put arr[2] on stack
<+24>:    mov    %edi,-0x14(%rsp)         # put arr[3] on stack
<+28>:    movl   $0x4567,-0x10(%rsp)      # put arr[4] on stack
<+36>:    movl   $0x80000000,-0xc(%rsp)   # put arr[5] on stack
<+44>:    movl   $0x13,-0x8(%rsp)         # put arr[6] on stack
<+52>:    mov    -0x20(%rsp,%rdi,4),%eax  # initialize total = arr[param]
<+56>:    imul   %eax,%eax                # multiply total by itself
<+59>:    add    $0x7,%eax                # add 7 (count) to return
<+62>:    retq                            # return

Q1: What expression (e.g. something in a format like %rdx + 0x5), represents the location of the first element in arr? The last?

A1: %rsp - 0x20 (note: in hex!) is the address of the first element - we can gather from the first mov instruction. %rsp - 0x8 is the address of the last element.

Q2: What register stores the value of total, and also squared? (registers can be reused for multiple variables!)

A2: %eax. We can see this because all the operations of initializing total, multiplying by itself, and adding 7 (count) all happen in %eax. Note that eax is the return value register. Rather than storing it somewhere else and putting it in %eax when returning, it just uses %eax the whole time.

2) C to Assembly Translation

Dereferencing Modes

Q3: There are subtle differences between the assembly generated for each function. What are they, and why are they different?

A3: For the first lines in each function, the mov instruction in the first function is movb but in the second is movl because the number of bytes to store changes from 1 to 4 (b means byte and l means 4 bytes). For the second lines in each function, the addressing mode in the first function is indexed but in the second is scaled indexed. This is because the second version must also scale by a multiplier of 4 for the size of an int (the first example scales by a multiplier of 1, so can omit this).

Q4: Edit both functions' second lines to replace ptr[index] with ptr[7]. How does this change the addressing mode using for the destination operand of the mov instruction? Do both deref functions change in the same way?

A4: The addressing mode changes to indirect w/ displacement to account for indexing further. The displacement is in terms of bytes, depending on the type of the pointer being used. deref_one has a char * so it's 1 x 7, while deref_two has an int * so it's 4 x 7. Both change identically otherwise, just the displacement is different.

Signed/Unsigned Arithmetic

Q5: When doing a right-shift, does gcc emit an arithmetic (sar) or logical (shr) shift? Does it matter whether the argument is signed or unsigned?

A5: The arithmetic shift comes up with the signed types, and the logical shift is used for the unsigned types.

Q6: What is the largest shift amount that could be specified using the %cl register for an unsigned int? Why might this limit make sense?

A6: The shift will only look at the low-order log2(w) bits of %cl to know how much to shift. For an unsigned int, this means %cl could specify a shift amount of at most 31 (the max number that fits in log2(32) = 5 bits). This makes sense as a limit because it is undefined to shift an unsigned int more than 31 places.

`lea` and Multiplication

Q7: In combine1 and combine2, the compiler doesn't use add or multiply. How does it use lea instead?

A7: The compiler gets creative with lea, using its form with addition and a scale factor to calculate the expression in one instruction!

Q8: combine3 no longer fits the limited form of lea, but the compiler still avoids multiply. What does it use instead?

A8: A left shift by 4, followed by an add (aka multiply by 16, then add itself again).

Division

Q9: what does the assembly do to account for a fixup of 1 if the dividend is negative?

A9: It uses the sign bit (clever!) as the amount to add as a fixup. If the value is nonnegative, the sign bit will be 0 (meaning we add no fixup before shifting). If the value is negative, the sign bit will be 1 (meaning we add a fixup of 1 before shifting). We isolate the sign bit by doing a logical shift to ensure we fill with 0s, so shifting to the right by 31 will either result in 0 (if nonnegative) or 1 (if negative).

End-Lab Thought Questions

Give an example of a C sequence/expression that could be compiled into a variety of different, but equivalent, assembly instruction(s). x /= 2 (shift, division, subtraction, etc.)
What is the difference between sar and shr? sar is arithmetic shift, shr is logical shift.
How is the return value of a function communicated from the callee to the caller? It is put in the %rax register.
What kind of clues might suggest that an lea instruction is doing an address calculation versus when it is merely arithmetic? Treating the registers used in the computation as addresses, e.g. going to that memory location, adding hops, etc.

Checkoff Answers

Assembly. What is the difference between the gdb step and stepi commands? step executes one line of C, stepi executes one assembly instruction.
Two's Complement. What benefits does two's complement have at the assembly level? We can perform addition while disregarding the sign, because the bit representation works in both cases.
Division. What tricks does GCC use to handle division for positive and negative numbers? Why are these necessary? GCC has to special-case division by positive and negative numbers, and adds a "fixup" to ensure that negative values are rounded towards zero. This is because shifting a negative number will round away from zero, unlike for positive numbers where it will round towards zero.