Page 1
Friday, February 11, 2022
Computer Systems
Winter 2022
Stanford University
Computer Science Department
Reading: Course Reader: x86-64 Assembly
Language, Textbook: Chapter 3.1
-
3.4
Lecturer: Chris Gregg
CS 107
Lecture 11:
Assembly Part II
Page 2
2
Today's Topics
•
Reading: Course Reader: x86-64 Assembly Language, Textbook: Chapter
3.5
-
3.6
•
Logistics
•
See note from Keith Schwartz re: Girl Code on Piazza
(
https://piazza.com/class/jc2gfatago37n3?cid=667
)
•
Midterm Comments
•
Programs from class: /afs/ir/class/cs107/samples/lect11
•
More x86 Assembly Language
•
Review of what we know so far
•
The lea instruction
•
pushing and popping from the stack
•
Unary operations, Binary operations, Shift operations
•
Special multiplication and division
•
Control
•
Condition codes
•
Conditional branches
Page 3
3
What did we cover last Monday?
•
Registers:
•
16 regular integer registers, %rax, %rbx, ...
•
naming is historical, and a register has four nested parts:
•
Operand forms: lots of ways we can refer to immediate values, register
values, or memory:
%rax
%eax
%ax
%al
return value
Operand
Value Comment
%rax
0x100
Register
0x104
0xAB
Absolute address
$0x108
(%rax)
0x108
Immediate
(%rax)
0xFF
Address
0x100
4(%rax)
0xAB
Address
0x104
9(%rax,%rdx)
0x11
Address
0x10C
260(%rcx,%rdx)
0x13
Address
0x108
0xFC(,%rcx,4)
0xFF
Address
0x100
Address
Value
Register
Value
0x100
0xFF
%rax
0x100
0x104
0xAB
%rcx
0x1
0x108
0x13
%rdx
0x3
0x10C
0x11
Page 4
4
What did we cover last Monday? (continued)
•
Data movement instructions:
•
Examples:
Instruction
Effect
Descripti
on
MOV
S
,
D
D
⟵
S
Move
movb
Move byte
movw
Move word
movl
Move double word
movq
Move quad word
movabsq
I
,
R
R
⟵
I
Move absolute quad word
1
movl $0x4050,%eax
Immediate
—
Register, 4 bytes
2
movw %cx,%dx
Register
—
Register,
2 bytes
3
movb (%rdi,%rcx),%al
Memory
—
Register,
1 byte
4
movb $
-
17,(%rsp)
Immediate
—
Memory,
1 byte
5
movq %rax,
-
12(%rbp)
Register
—
Memory,
8 bytes
Page 5
5
What did we cover last Monday? (continued)
movl $0x4050, %eax
movq %rsp, %rax
movw %ax, %dx
movl $1,%ecx
movb (%rdi, %rcx), %al
movb $0x61, (%rsp)
movb $0x61,(%rdi, %rcx)
movq %rax, 12(%rsp)
./mov_examples_main
Before function call: hello
After function call: hallo
Page 6
6
The
lea
instruction
•
The lea instruction is related to the mov instruction. It has the form of an instruction
that reads from memory to a register, but it
does not reference memory at all
.
•
It's first operand appears to be a memory reference, but instead of reading from the
designated location, the instruction copies the effective address to the destination.
•
You can think of it as the "&" operator in C — it retrieves the address of a memory
location:
Instruction
Effect
Description
leaq S,D
D
⟵ &
S
Load effective address
Examples: if
%rax
holds value
x
and
%rcx
holds value
y
:
leaq 6(%rax), %rdx
:
%rdx
now holds x + 6
leaq (%rax,%rcx), %rdx
:
%rdx
now holds x + y
leaq (%rax,%rcx,4), %rdx
:
%rdx
now holds x + 4*y
leaq 7(%rax,%rax,8), %rdx
:
%rdx
now holds 7 + 9x
leaq 0xA(,%rcx,4), %rdx
:
%rdx
now holds 10 + 4y
leaq 9(%rax,%rcx,2), %rdx
:
%rdx
now holds 9 + x + 2y
Page 7
7
The
lea
instruction
•
Take a look at the following C code (left) and assembly generated by gcc (right):
#include<stdlib.h>
#include<stdio.h>
int main() {
int a = 42;
int *aptr = &a;
printf("a: %d, aptr: %p", a, aptr);
}
.LC0:
.string "a: %d, aptr: %p"
main:
subq $24, %rsp
movl $42, 12(%rsp)
leaq 12(%rsp), %rdx
movl $42, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
addq $24, %rsp
ret
•
Notice that the value 42 is moved into
memory via a
mov
instruction, while the
address of that value is moved into
%rdx
using the
lea
instruction. In both cases,
12(%rsp)
is the location in memory that
is referred to, but mov moves data to that
address, and lea moves the address into
the register.
•
Note: we often use the lea instruction for
calculations that aren't memory addresses! This
is because we get a nifty linear equation from
lea, and it really doesn't matter if it isn't an
address.
Page 8
8
Pushing and Popping from the Stack
•
As we have seen from stack-based memory allocation in C, the stack is an
important part of our program, and assembly language has two built-in
operations to use the stack.
•
Just like the stack ADT, they have a first-in, first-out discipline.
•
By convention, we draw stacks upside down, and the stack "grows" downward.
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
Page 9
9
Pushing and Popping from the Stack
•
The push and pop operations write and read from the stack, and they also
modify the stack pointer,
%rsp
:
Instructi
on
Effect
Description
pushq
S
R[
%rsp
] ⟵
R[
%rsp
]-8;
Push quad word
M[R[%rsp]]
⟵ S
popq
D
D
⟵
M[R[%rsp]];
Push quad word
R[
%rsp
] ⟵
R[
%rsp
]+8
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
Page 10
10
Pushing and Popping from the Stack
•
Example:
Initially
%rax
0x123
%rdx
0
%rsp
0x108
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
Page 11
11
Pushing and Popping from the Stack
•
Example:
Initially
%rax
0x123
%rdx
0
%rsp
0x108
pushq %rax
%rax
0x123
%rdx
0
%rsp
0x100
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
0x123
0x100
Page 12
12
Pushing and Popping from the Stack
•
Example:
Initially
%rax
0x123
%rdx
0
%rsp
0x108
pushq %rax
%rax
0x123
%rdx
0
%rsp
0x100
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
0x123
0x100
popq %rdx
%rax
0x123
%rdx
0x123
%rsp
0x108
Stack "bottom"
Stack "top"
Increasing
address
.
.
.
0x108
0x123
0x100
Page 13
13
Pushing and Popping from the Stack
•
As you can tell, pushing a quad word onto the stack involves first decrementing
the stack pointer by 8, and then writing the value at the new top-
of
-stack
address.
•
Therefore, the behavior of the instruction pushq %rbp is equivalent to the pair of
instructions:
subq $8, %rsp
(subq is a subtraction, and this decrements the stack pointer)
movq $rbp,(%rsp)
(Store %rbp on the stack)
•
The behavior of the instruction popq %rax is equivalent to the pair of
instructions:
movq (%rsp), %rax
(Read %rax from the stack)
addq $8,%rsp
(Increment the stack pointer)
Page 14
14
Unary Instructions
Instruction
Effect Description
inc D
D
⟵ D + 1
Increment
dec D
D
⟵ D - 1
Decrement
neg D
D
⟵ -D
Negate
not D
D
⟵ ~D
Complement
The following instructions act on a single operand (register or memory):
inc D
is reminiscent of C's
++
operator, and
neg D
is reminiscent of C's
--
operator.
Examples:
incq 16(%rax)
decq %rdx
not %rcx
Page 15
15
Binary Instructions
Instruction
Effect Description
add S, D
D
⟵ D + S
Add
sub S, D
D
⟵ D - S
Subtract
imul S, D
D
⟵ D * S
Multiply
xor S, D
D
⟵ D ^ S
Exclusive-
or
or S, D
D
⟵ D | S
Or
and S, D
D
⟵ D & S
And
The following instructions act on two operands (register or memory, but not both):
Reading the syntax is a bit tricky —
e.g.,
subq %rax,%rdx
decrements
%rdx
by
%rax
, and can be read as "Subtract
%rax
from
%rdx
"
Examples:
addq %rcx,(%rax)
xorq $16,(%rax,%rdx,8)
subq %rdx,8(%rax)
Page 16
16
Shift Instructions
Instruction
Effect Description
sal k, D
D
⟵ D << k
Left shift
shl k, D
D
⟵ D << k
Left shift (same as
sal
)
sar k, D
D
⟵ D >>
A
k
Arithmetic right shift
shr k, D
D
⟵ D >>
L
k
Logical right shift
The following instructions perform shifts. The first operand can be either an
immediate value or the byte
%cl
(and only that register!)
Technically, you could shift up to 255 with
%cl
, but the data width operation
determines how many bits are shifted, and the high order bits are ignored. For example,
if
%cl
has a value of 0xFF, then
shlb
shifts by 7 (ignoring the upper bits),
shlw
shifts
by 15,
shll would shift by 31, and shlq would shift by 63.
Examples:
shll $3,(%rax)
shr %cl,(%rax,%rdx,8)
sar $4,8(%rax)
Page 17
17
Special Multiplication Instructions
Instruction
Effect Description
imulq S
R[
%rdx
]:R[%rax] ⟵ S ×
R[
%rax
]
Signed full multiply
mulq S
R[
%rdx
]:R[%rax] ⟵ S ×
R[
%rax
]
Unsigned full multiply
Recall that multiplying two 64-bit numbers can produce a 128-bit result. The
x86
-64 instruction set supports 128-bit numbers with the "oct" (16-byte) size.
The
imulq instruction has two forms. One, shown on slide 15, takes two operands
and leaves the result in a single 64-bit register, truncating if necessary (and acts the
same on signed and unsigned numbers). Example:
imulq %rbx, %rcx
The second form (shown above) multiplies the source by
%rax
, and puts the
product into the 128-bit
%rdx
(upper 64 bits) and
%rax
(lower 64 bits).
Example:
mul %rdx
This multiples
%rdx
by
%rax
and puts the result into the combined
%rdx:%rax
registers.
Page 18
18
Multiplication Example
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
typedef unsigned __int128 uint128_t;
void store_uprod(uint128_t *dest,
uint64_t x, uint64_t y)
{
*dest = x * (uint128_t) y;
}
int main()
{
uint64_t x = 2000000000000; // 2 trillion
uint64_t y = 3000000000000; // 3 trillion
uint128_t z;
store_uprod(&z,x,y);
print_uint128(z); // see lect12 code
// for function definition
return 0;
}
mov %rsi,%rax
mul %rdx
mov %rax,(%rdi)
mov %rdx,0x8(%rdi)
retq
Note:
%rdi
is 1st argument
%rsi
is 2nd argument
%rdx
is 3rd argument
Page 19
19
Special Division Operations
Instruction
Effect Description
cqto
R[
%rdx
]:R[%rax] ⟵ SignExtend(R[%rax])
Convert to oct word
idivq S
R[
%rdx
] ⟵
R[
%rdx
]:R[%rax] mod S;
Signed divide
R[
%rax
] ⟵
R[
%rdx
]:R[%rax] ÷ S
divq S
R[
%rdx
] ⟵
R[
%rdx
]:R[%rax] mod S;
Unsigned divide
R[
%rax
] ⟵
R[
%rdx
]:R[%rax] ÷ S
Slide 11 did not list a division instruction or modulus instruction. There are
single-operand divide instructions (shown below):
The dividend for the
idivq
and divq instructions is the 128-bit quantity in registers
%rdx
(high-order 64-bits) and
%rax
(low-order 64-bits). The divisor is the operand source.
The quotient from the division is stored in
%rax
, and the remainder is stored in %rdx.
For most division, the dividend is just in
%rax
, and %rdx is either all zeros (for unsigned,
or the sign bit of %rax (for signed arithmetic). The ctqo instruction can be used to
accomplish this.
Page 20
20
Division Example
void remdiv(long x, long y,
long *qp, long *rp)
{
long q = x / y;
long r = x % y;
*qp = q;
*rp = r;
}
mov %rdx,%r8
# copy qp
mov %rdi,%rax
# Move x to lower 8 bytes of dividend
cqto
# Sign-extend to upper 8 bytes of dividend
idiv %rsi
# Divide by y
mov %rax,(%r8)
# Store quotient at qp
mov %rdx,(%rcx)
# Store remainder at rp
retq
Note:
%rdi
is 1st argument
%rsi
is 2nd argument
%rdx
is 3rd argument
%rcx
is 4th argument
gcc
is clever enough to see that only one division is needed!
Page 21
21
Control
•
So far, we have only been discussing "straight-line" code, where one
instruction happens directly after the previous instruction.
•
However, it is often necessary to perform one instruction or another
instruction based on the logic in our programs, and assembly code gives us
tools to do this.
•
We can alter the flow of code using a "jump" instruction, which indicates that
the next instruction will be somewhere else in the program (this is called a
branch
)
•
We will start by discussing "condition codes" that are set when we do
arithmetic (and other operations), and then we will talk about jump
instructions to change control flow.
Page 22
22
Condition Codes
•
Besides the registers we have already discussed, the CPU has a separate
set of single-bit
condition code
registers describing attributes of recent
operations.
•
We can use these registers (by testing them) to perform branches in the
code.
•
These are the most useful condition code registers:
•
CF: Carry flag. The most recent operation generated a carry out of the most
significant bit. Used to detect overflow for unsigned operations.
•
ZF: Zero flag. The most recent operation yielded zero.
•
SF: Sign flag. The most operation yielded a negative value.
•
OF: Overflow flag. The most recent operation caused a two's-complement overflow—
either negative or positive.
Page 23
23
Condition Codes: Examples
•
CF: Carry flag. The most recent operation generated a carry out of the most
significant b/t. Used to detect overflow for unsigned operations.
•
ZF: Zero flag. The most recent operation yielded zero.
•
SF: Sign flag. The most operation yielded a negative value.
•
OF: Overflow flag. The most recent operation caused a two's-complement overflow—
either negative or positive.
int a = 5;
int b = -5;
int t = a + b;
Which flag above would be set?
The
ZF
flag.
Page 24
24
Condition Codes: Examples
•
CF: Carry flag. The most recent operation generated a carry out of the most
significant b/t. Used to detect overflow for unsigned operations.
•
ZF: Zero flag. The most recent operation yielded zero.
•
SF: Sign flag. The most operation yielded a negative value.
•
OF: Overflow flag. The most recent operation caused a two's-complement overflow—
either negative or positive.
int a = 5;
int b = -5;
int t = a + b;
Which flag above would be set?
The
ZF
flag.
int a = 5;
int b = -20;
int t = a + b;
Which flag above would be set?
The
SF
flag.
Page 25
25
Condition Codes
•
The leaq instruction does not set any condition codes (because it is intended
for address computations), but the other arithmetic instructions we talked about
do set them (
inc
, dec, neg, not, add, sub, imul, xor, or, and, shl, sar,
shr
, etc.)
•
For logical operations (e.g., xor), the carry and overflow flags are set to 0.
•
For shift operations, the carry flag is set to the last bit shifted out, while the
overflow flag is set to zero.
•
inc and dec set the overflow and zero flags, but leave the carry flag
unchanged.
Page 26
26
cmp
and
test
•
There are two types of instructions we can use that set the condition codes
without altering any other registers
, the cmp and test instructions:
Instruction
Based on
Description
CMP
S
1
, S
2
S
2
- S
1
Compare
cmpb
Compare byte
cmpw
Compare word
cmpl
Compare double word
cmpq
Compare quad word
TEST
S
1
, S
2
S
2
& S
1
Test
testb
Test byte
testw
Test word
testl
Test double word
testq
Test quad word
•
Be careful! The
operands for
cmp
are
listed in reverse order!
•
Often, we use
testq %rax, %rax
to see whether
%rax
is negative, zero, or
positive.
Page 27
27
Accessing the Condition Codes
•
There are three common ways to use the condition codes:
1.
We can set a single byte to 0 or 1 depending on some combination of the
condition codes.
2.
We can conditionally jump to some other part of the program.
3.
We can conditionally transfer data.
Page 28
28
•
There are three common ways to use the condition codes:
1.
We can set a single byte to 0 or 1 depending on some combination of the
condition codes.
2.
We can conditionally jump to some other part of the program.
3.
We can conditionally transfer data.
Accessing the Condition Codes
Instruction
Synonym
Set Condition
sete D
setz
Equal / zero
setne D
setnz
Not equal / not zero
sets D
Negative
setns D
Nonnegative
setg D
setnle
Greater (signed >)
setge D
setnl
Greater or equal (signed >=)
setl D
setnge
Less (signed <)
setle D
setng
Less or equal (signed <=)
seta D
setnbe
Above (unsigned >)
setae D
setnb
Above or equal (unsigned >=)
setb D
setnae
Below (unsigned <)
setbe D
setna
Below or equal (unsigned <=)
int comp(data_t a, data_t b)
a in
%rdi,
b in %rsi
comp:
cmpq %rsi, %rdi # Compare a:b
setl
%al # Set low-order byte of
# %eax to 0 or 1
movzbl %al, %eax # Clear rest of %eax
# (and rest of %rax)
ret
Example:
a < b
Page 29
29
•
There are three common ways to use the condition codes:
1.
We can set a single byte to 0 or 1 depending on some combination of the
condition codes.
2.
We can conditionally jump to some other part of the program.
3.
We can conditionally transfer data.
Accessing the Condition Codes
Instruction
Synonym
Set Condition
jmp
Label
Direct jump
jmp
*Operand
Indirect jump
je
Label
jz
Equal / zero (ZF=1)
jne
Label
jnz
Not equal / not zero (ZF=0)
js
Label
Negative (SF=1)
jns
Label
Nonnegative (SF=0)
jg
Label
jnle
Greater (signed >) (SF=0 and SF=OF)
jge
Label
jnl
Greater or equal (signed >=) (SF=OF)
jl
Label
jnge
Less (signed <) (SF != OF)
jle
Label
jng
Less or equal (signed <=) (ZF=1 or SF!=OF)
ja
Label
jnbe
Above (unsigned >) (CF = 0 and ZF = 0)
•
Jump instructions jump to
labels
in assembly code, and
those labels are changed to
addresses (most often relative)
•
jmp is an unconditional jump,
meaning that the jump is always
taken.
•
Unconditional jumps can be
direct
or
indirect
•
Conditional jumps must be
direct.
Page 30
30
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Compile to an object file:
gcc -c -Og while_loop.c
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
Page 31
31
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
Set
%eax
to
0
Compile to an object file:
gcc -c -Og while_loop.c
Page 32
32
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
%rax: 0
Compile to an object file:
gcc -c -Og while_loop.c
Page 33
33
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
compare
%eax
to
0x63 (99d)
by
subtracting
%eax - 0x63
, setting the Sign
Flag (SF) because the result is negative.
%rax: 0
Compile to an object file:
gcc -c -Og while_loop.c
Page 34
34
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
jle
is "jump less than or equal." The Sign
Flag indicates that the result was negative
(less than), so we jump to
0x7
.
%rax: 0
Compile to an object file:
gcc -c -Og while_loop.c
Page 35
35
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
Add 1 to
%eax
%rax: 1
Compile to an object file:
gcc -c -Og while_loop.c
Page 36
36
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
%rax: 1
Compare
%eax
to
0x63 (99d)
by subtracting
%eax - 0x63
. When
%rax
is 0, what flags change based on the the comparison? (We care about
Zero
Flag
,
Sign Flag
,
Carry Flag
, and
Overflow Flag):
Compile to an object file:
gcc -c -Og while_loop.c
0
-
99, so
SF and CF
Page 37
37
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
%rax: 1
Eventually, this will become positive (when
%eax
is 100), and the loop will end.
Compile to an object file:
gcc -c -Og while_loop.c
Page 38
38
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x0,%eax
0x0000000000000005 <+5>:
jmp 0xa <loop+10>
0x0000000000000007 <+7>:
add $0x1,%eax
0x000000000000000a <+10>: cmp $0x63,%eax
0x000000000000000d <+13>: jle 0x7 <loop+7>
0x000000000000000f <+15>: repz retq
End of assembler dump.
Could the compiler have done better with
this loop?
Compile to an object file:
gcc -c -Og while_loop.c
Page 39
39
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Compile to an object file:
gcc -c -Og while_loop.c
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x64,%eax
0x0000000000000005 <+5>:
sub $0x1,%eax
0x0000000000000008 <+8>:
jne 0x5 <loop+5>
0x000000000000000a <+10>: repz retq
End of assembler dump.
Fewer lines, less jumping!
gcc -c -O1 while_loop.c
Page 40
40
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Compile to an object file:
gcc -c -Og while_loop.c
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
mov $0x64,%eax
0x0000000000000005 <+5>:
sub $0x1,%eax
0x0000000000000008 <+8>:
jne 0x5 <loop+5>
0x000000000000000a <+10>: repz retq
End of assembler dump.
Could we do better?
gcc -c -O1 while_loop.c
Page 41
41
Jump instructions example
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Compile to an object file:
gcc -c -Og while_loop.c
$ gdb while_loop.o
The target architecture is assumed to be i386:x86
-
64
Reading symbols from while_loop.o...done.
(gdb) disas loop
Dump of assembler code for function loop:
0x0000000000000000 <+0>:
repz retq
End of assembler dump.
Sure! As the optimization level goes up,
gcc
gets smarter! The compiler realized that this
loop is not doing anything, so it completely
optimized it out!
gcc -c -O1 while_loop.c
gcc -c -O2 while_loop.c
Page 42
42
3 minute break
Page 43
43
Digging Deeper: Jump Instruction Encodings
•
As we have mentioned before, assembly language is still one step higher than
machine code.
•
It is instructive in this case to look at the machine code for some jump
instructions, just to see how the underlying machine is referencing where to
jump.
•
Remember, %rip is the instruction pointer, which has an address of the
current instruction.
•
Well…kind of. On older x86 machines, when an instruction was executing,
the first thing that happened was that
%rip
is changed to point to the next
instruction. The instruction set has retained this behavior.
•
Jump instructions are often encoded to jump relative to %rip. Let's see what
that means in practice…
Page 44
44
Digging Deeper: Jump Instruction Encodings
•
Let's look at our while loop again:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
Compile to an object file:
gcc -c -Og while_loop.c
Page 45
45
Digging Deeper: Jump Instruction Encodings
•
Take the following function:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
0-based addresses for each
instruction (will be replaced
with real addresses when a full
program is created)
Compile to an object file:
gcc -c -Og while_loop.c
Page 46
46
Digging Deeper: Jump Instruction Encodings
•
Take the following function:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
Machine code for the instructions.
Instructions are "variable length"
—
the
mov
instruction is 5 bytes, the
tmp is 3 bytes, etc.
Compile to an object file:
gcc -c -Og while_loop.c
Page 47
47
Digging Deeper: Jump Instruction Encodings
•
Take the following function:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
The
jmp
instruction. "eb" means that this is a jmp,
and
03
is the number of instructions to jump, relative
to
%rip
. When the instruction is executing, %rip is
set to the next instruction (
7
in this case). So…7 + 3
is 0xa, so this instruction jumps to
0xa
.
Compile to an object file:
gcc -c -Og while_loop.c
Page 48
48
Digging Deeper: Jump Instruction Encodings
•
Take the following function:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
The
cmp
instruction. Notice that the 0x63 is
embedded into the machine code, because it is an
immediate value.
Compile to an object file:
gcc -c -Og while_loop.c
Page 49
49
Digging Deeper: Jump Instruction Encodings
•
Take the following function:
void loop()
{
int i = 0;
while (i < 100) {
++i;
}
}
Run the objdump program:
objdump -d while_loop.o
Disassembly of section .text:
0000000000000000 <loop>:
0: b8 00 00 00 00
mov $0x0,%eax
5: eb 03
jmp a <loop+0xa>
7: 83 c0 01
add $0x1,%eax
a: 83 f8 63
cmp $0x63,%eax
d: 7e f8
jle 7 <loop+0x7>
f: f3 c3
repz retq
The
jle
instruction. "7e
" means that this is a
jle
(jump
if less than), and
f8
is the number of instructions to jump
(in two's complement! So, it means
-8), relative to %rip,
which is at
0xf
when the instruction is running. So,
0xf
-
8 is 0xa, so this instruction jumps to 0x7
.
Compile to an object file:
gcc -c -Og while_loop.c
Page 50
50
Practice: Reverse
-engineer Assembly to C
•
Take the following function:
long test(long x, long y, long z) {
long val = __________;
if (__________) {
if (_________)
val = __________;
else
val = __________;
} else if (__________)
val = __________;
return val;
}
# x in %rdi, y in %rsi, z in %rdx
test:
leaq (%rdi,%rsi), %rax
addq %rdx, %rax
cmpq $
-
3, %rdi
jge .L2
cmpq %rdx, %rsi
jge .L3
movq %rdi, %rax
imulq %rsi, %rax
ret
.L3:
movq %rsi, %rax
imulq %rdx, %rax
ret
.L2:
cmpq $2, %rdi
jle .L4
movq %rdi, %rax
imulq %rdx, %rax
.L4:
rep; ret
Page 51
51
Practice: Reverse
-engineer Assembly to C
•
Take the following function:
long test(long x, long y, long z) {
long val =
x + y + z
;
if (
x < -3
) {
if (
y < z
)
val =
x * y
;
else
val =
y * z
;
} else if (
x > 2
)
val =
x * z
;
return val;
}
# x in %rdi, y in %rsi, z in %rdx
test:
leaq (%rdi,%rsi), %rax
addq %rdx, %rax
cmpq $
-
3, %rdi
jge .L2
cmpq %rdx, %rsi
jge .L3
movq %rdi, %rax
imulq %rsi, %rax
ret
.L3:
movq %rsi, %rax
imulq %rdx, %rax
ret
.L2:
cmpq $2, %rdi
jle .L4
movq %rdi, %rax
imulq %rdx, %rax
.L4:
rep; ret
Page 52
52
Conditional Moves
•
The x86 processor provides a set of "conditional move" instructions that move
memory based on the result of the condition codes, and that are completely
analogous to the jump instructions:
Instruction
Synonym
Move Condition
cmove S,R
cmovz
Equal / zero (ZF=1)
cmovne S,R
cmovnz
Not equal / not zero (ZF=0)
cmovs S,R
Negative (SF=1)
cmovns S,R
Nonnegative (SF=0)
cmovg S,R
cmovnle
Greater (signed >) (SF=0 and SF=OF)
cmovge S,R
cmovnl
Greater or equal (signed >=) (SF=OF)
cmovl S,R
cmovnge
Less (signed <) (SF != OF)
cmovle S,R
cmovng
Less or equal (signed <=) (ZF=1 or SF!=OF)
cmova S,R
cmovnbe
Above (unsigned >) (CF = 0 and ZF = 0)
cmovae S,R
cmovnb
Above or equal (unsigned >=) (CF = 0)
cmovb S,R
cmovnae
Below (unsigned <) (CF = 1)
cmovbe S,R
cmovna
Below or equal (unsigned <=) (CF = 1 or ZF = 1)
•
With these instructions, we
can sometimes eliminate
branches, which are
particularly inefficient on
modern computer
hardware.
Page 53
53
Jumps -
vs
-
Conditional Move
long absdiff(long x, long y)
{
long result;
if (x < y)
result = y
-
x;
else
result = x
-
y;
return result;
}
long cmovdiff(long x, long y)
{
long rval = y
-
x;
long eval = x
-
y;
long ntest = x >= y;
if (ntest) rval = eval;
return rval;
}
# x in %rdi, y in %rsi
cmovdiff:
movq %rsi, %rax
subq %rdi, %rax
movq %rdi, %rdx
subq %rsi, %rdx
cmpq %rsi, %rdi
cmovge %rdx, %rax
ret
# x in %rdi, y in %rsi
absdiff:
cmpq %rsi, %rdi
jge .L2
movq %rsi, %rax
subq %rdi, %rax
ret
.L2:
movq %rdi, %rax
subq %rsi, %rax
ret
Which is faster?
Let's test!
Page 54
54
References and Advanced Reading
•
References:
•
Stanford guide to x86-64: https://web.stanford.edu/class/cs107/guide/x86-
64.html
•
CS107 one-page of x86-64:
https://web.stanford.edu/class/cs107/resources/onepage_x86-
64.pdf
•
gdbtui: https://beej.us/guide/bggdb/
•
More gdbtui: https://sourceware.org/gdb/onlinedocs/gdb/TUI.html
•
Compiler explorer: https://gcc.godbolt.org
•
Advanced Reading:
•
x86-64 Intel Software Developer manual:
https://software.intel.com/sites/default/files/managed/39/c5/325462-
sdm
-
vol
-
1-
2abcd
-
3abcd.pdf
•
history of x86 instructions:
https://en.wikipedia.org/wiki/X86_instruction_listings
•
x86-64 Wikipedia: https://en.wikipedia.org/wiki/X86-64