Lab sessions Wed Feb 19 to Sun Feb 23
Lab written by Julie Zelenski, with modifications by Nick Troccoli
Learning Goals
This lab is designed to give you a chance to:
- Become more familiar with useful GDB commands and tricks when working with assembly
- observe and understand the correct operation of the runtime stack
- diagnose symptoms of stack mismanagement
Get Started
Clone the lab starter code by using the command below.
git clone /afs/ir/class/cs107/repos/lab6/shared lab6
Note: there will be compiler warnings when you make
this starter project - this is expected for some of the exercises!
Next, pull up the online lab checkoff and have it open in a browser so you can jot things down as you go. Only one checkoff needs to submitted for both you and your partner.
Keep our x86-64 reference sheet handy while you work!
1) GDB and Stack Mechanics
30 minutes + 10min discussion
Learning Goals: Become familiar with new GDB commands to navigate the assembly representation of a program. Stepping through a program in GDB and examining it at the assembly level is key to the kinds of security explorations you will be doing on assign5.
Starter code file: stack.c
New GDB Commands
GDB is an indispensible tool for reverse engineering, and we can't emphasize enough how familiarity with GDB will make your life easier on assign5! Here are more tips on top of the ones introduced last time.
Printing Registers
print
with /[format]
works for registers, too, to print a value in a certain format - note that in GDB, register names begin with $
, not %
:
(gdb) p $rax
$1 = 4196128
(gdb) p/t $rax
$2 = 10000000000011100100000
You can also print using a C typecast - a register value is treated as void*
, but you can apply a typecast to change the interpretation. For example:
(gdb) p (char *)$rax
$3 = 0x400720 "Hello, world!\n"
You can also do more complex casting and dereferencing, like this:
(gdb) p *(long *)$rax
(gdb) p *(long *)((char *)$rsp + 8)
To print out the condition code values, use p $eflags
.
Navigating Assembly Execution
disassemble
displays the assembly instructions for the current function. Specify a function name (e.gdisassemble myfunc
) to display its assembly instructions instead.nexti
lets you step to the next assembly instruction (rather than line of C code), stepping overcall
instructions.stepi
lets you step to the next assembly instruction including stepping into function calls.finish
finishes execution of the current function and returns to the calling function.- you can break on an individual assembly instruction by specifying its address with an asterisk before it, like
break *0x401368
Split Layout
If you write layout split
, you will be taken to a (somewhat buggy) split view GDB mode. This can be handy, showing you both the GDB prompt and source/assembly/registers at the same time! (Try layout src
, layout reg
or layout asm
to swap between these different views). However, this view is buggy, and the view can sometimes get corrupted, making it hard to understand where you are in the program. You can type refresh
to refresh the view, but take care to ensure you don't accidentally enter a command you didn't indend due to the corrupted view. You can enter ctl-x a
to return to the regular GDB view.
Display
display
lets you specify an expression and it will print out what it evaluates to every time you single-step. E.g. display/2gx $rsp
will automatically print out the 16 bytes ("(2) (g)iantwords in he(x)") at the top of the stack each time you step in GDB. Type display
with no arguments to list all of the currently-set expressions to display, and undisplay X
to stop displaying expression X
in the list shown by display
.
Conditional Breakpoints
You can set breakpoints to only trigger when certain conditions in your code are true. For instance, say you have the following loop in your code:
1 for (int i = 0; i < count; i++) {
2 ...
3 }
If you wanted to step through the code inside the loop just the last time the loop executed, you can add a condition (in C code syntax, referencing local variables) for when the breakpoint should be stopped at:
(gdb) break 2 if i == count - 1
The format is [BREAKPOINT] if [CONDITION]
. These conditions can also be in assembly, such as:
(gdb) break *0x401368 if $rbp == 2
Customize Breakpoint Behavior
You can tell GDB to execute a certain sequence of commands each time a breakpoint is reached. For instance, maybe for one breakpoint you always want to print out a variable value there and then continue
:
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
print myVariable
continue
end
Watchpoints
The gdb watch
command is given an expression or a memory location to watch. gdb sets up a "watchpoint", a special kind of breakpoint that stops your program whenever there is a change in the value of that expression or a write to that memory location. Here are some examples:
(gdb) watch myvar // report when myvar changes
(gdb) watch *0x608502 // report if write to memory location
Watchpoints can be a useful tool for tracking down those bugs that make mysterious changes to memory (and may be a useful tool in the next assignment!)
Hands-on Practice
Let's try out all these new gdb commands! First, read over the program stack.c
. Then compile the program and load it into gdb. Do each of the following steps:
break kermit
to put a breakpoint at the start of thekermit
functionrun
to run the program. It should pause right as it beginskermit
.disassemble kermit
(or even justdisassemble
if you just broke at the call tokermit
) to view the assembly for thekermit
functiondisplay/3i $rip
. From now on, every time you step, it will print out the next 3 assembly instructions to execute.x/1gx $rsp
to print out the 8 byte value at the top of the stack. ("Examine (1) (g)iantword in he(x) starting at $rsp")
Q1: What is the significance of this 8 byte value at the top of the stack right when kermit
is called? Try disassemble main
to see if this value pops up anywhere. We'll learn very soon that the call
instruction is used to invoke a function and that ret
is used to return from one. Why might the value at the top of the stack be important to remember?
Examine the first few assembly instructions in kermit
. Notice that it pushes the values currently in %rbp
and %rbx
onto the stack. Step over these instructions with nexti
, and as you do, try p $rsp
to see the stack grow.
After the call to binky
, kermit
can access the returned value in %eax
. However, later it calls dinky
, which will also put a return value into %eax
.
Q2: To where does kermit
copy binky
's return value to use it later? Step through to confirm your understanding.
Next, put a breakpoint on the clyde
function and enter continue
to continue execution until you reach it. clyde
is called by bigbird
, and is passed in an array as a parameter.
- Try printing out
arr
, and you'll get an address - since arrays are really passed as pointers to their first element! - From the GDB tip in lab4 we can do
print arr[0]@length
to print out its contents. Try it and see what is outputted. - But what if we don't have access to the source code? We can do this in assembly, too;
arr
is stored in%rdi
, and its length in%rsi
, and we can cast%rdi
to the appropriate type. Then we can use it as we wouldarr
. Specifically, we could do something likeprint ((CAST)$rdi)[0]@$rsi
Q3: What pointer type should we cast $rdi
to in order to complete this expression? (Hint: what pointer type is arr
really?) Try out your idea to see if it prints the array!
Now let's try out GDB watchpoints. The clyde
function iterates through the array it is given and increments each element's value by 1. We can use a watchpoint to see exactly where a specific element is being changed.
- From your current breakpoint at the start of
clyde
, set up a watchpoint to monitor changes toarr[1]
. - Continue from here and gdb will stop when
arr[1]
is changed. - When
arr[1]
changes from 2 to 3, executedisassemble
in GDB to see exactly what assembly instruction is responsible for changing it at that point. As a tip, the=>
will point to the instruction that is about to be executed, so the one that changedarr[1]
comes before this.
Q4: How does this instruction that is modifying arr[1]
connect back to the C code?
Feel free to step through execution and experiment with this code as you'd like. x86 has a mere handful of registers and they are in high demand; the compiler works hard to maximize their use. Parameters and return values are passed/returned using registers, and local variables will be kept in registers whenever possible. The compiler will prefer use of "scratch registers" (i.e., so called "callee-owned" registers) where possible so as to avoid having to save/restore the caller's data (as is required if using the "caller-owned" registers).
2) Stack Misuses
30 minutes + 10min discussion
Learning Goals: Experiment with how we can expose bugs with stack management by investigating assembly. This kind of investigating will be a portion of what you'll be doing on assign5.
Starter code file: smash.c
GCC always outputs proper assembly. But sometimes our C code can contain errors or vulnerabilities that cause unintended behavior, and that behavior can be most precisely understood at the assembly level.
The provided program smash.c
is one such example. It uses the standard C library function gets
, which has an inherently awful design. It is intended to read a single line of text from STDIN, stopping at the first newline character, and writing the read characters into the client's buffer. The fatal flaw of gets
is that its only argument is the starting address of the buffer, with no indication of that buffer's length. Without the length, gets
cannot tell when it should stop writing characters to avoid overflowing the buffer. Any input longer than its size will therefore write past the end. There is absolutely no way to use gets
safely. Its use has long been deprecated in favor of the properly-constrained fgets
function, but for reasons of backward compatibility, gets
lives on in the standard library.
First read the BUGS
section of man gets
for harsh words against the function and hints of the security problems therein. When you compile the smash
program, you'll get a variety of warnings from the compiler and linker that try to further dissuade you from using gets
. Let's observe the consequences if we proceed past the warnings and use the function anyway.
The smash.c
program calls the greet
function in a loop. greet
prompts the user for input using gets
, stores it in a fixed-size buffer (uh oh!), and returns an integer. At first glance, it seems like greet
can never return anything other than 2....or can it? Your hacking task in this part: come up with an input that makes greet
return 107!
Let's start by playing around with the program. Run ./smash
.
- When prompted for your name, enter
Grace Hopper
. This name fits in 16, no problem. - Now enter
Edsger W Dijkstra
. Just a little long is ok? - Now enter
Jonathan David Levin
. Definite overrun, but still getting away with it? - Push it to
John Jacob Jingleheimer Schmidt
. What? The return value changed? But how??
In order to understand stack vulnerabilities, it is essential to draw out a diagram of the stack. A stack diagram is the layout of memory for a function's stack space; where exactly local variables live. This allows us to see where they are in relation to one another, and exactly what happens if we write past space allocated for a variable.
Part 1: Stack Diagram (20 minutes)
Let's try to diagram out what the stack looks like for the greet
function.
In a stack diagram, we want to label where local variables live. Not all local variables live on the stack (some may be just in registers); but for any that do, we want to diagram exactly where they are stored. We can use the assembly of the function to help. Let's look at the first few lines for greet
:
sub $0x28,%rsp
movl $0x2,0x1c(%rsp)
mov $0x402008,%edi
mov $0x0,%eax
callq 0x401050 <printf@plt>
mov %rsp,%rax
mov %rax,%rdi
callq 0x401060 <gets@plt>
(P.S. variable argument functions (e.g printf
and scanf
variants) require a little extra setup relative to normal calls. The x86-64 calling conventions for variable argument functions must indicate the presence of any float/double arguments by setting %rax
to the count of vector registers used. If none are used (i.e., no parameters of float/double type), it sets %rax
to zero.)
Q1: In the first instruction, how much stack space is initially reserved for greet
? This is the initial size of our stack frame. %rsp points to the "top" of the stack (likely the bottom in your diagram).
Q2: In the second instruction, we copy 2, presumably into num
. Where is num
stored, relative to %rsp
? Add that to your diagram.
Q3: In the last 3 instructions, we set up the call to gets
. In the C code, we pass in buf
(a 16-byte buffer) as the parameter to gets
. Looking at the assembly, what does that tell us about where buf
is stored relative to %rsp
? Add that to your diagram.
Another way to construct your diagram is by running the program in GDB and stepping through the greet
function to print out the values of different addresses. For instance, after the first assembly instruction is executed, print out $rsp
to see what the specific address is for the top of the stack. Then print &num
and buf
in GDB as well to see their starting addresses.
Try using GDB to verify your diagram is accurate!.
Part 2: Stack Smash (10 minutes)
Now let's use this stack diagram to control the program behavior. Central to this is that, for any input longer then 15 (remember the null terminator!), the characters read by gets
will overflow the buffer into the memory that follows it. The key question: how can we design an input that will overflow buf
and overwrite num
to a value that we specify?
Q4: What is the shortest input (remember the null terminator!) that we can specify that will overwrite part of num
? What is it overwritten to?
Q5: The ASCII character with numeric value 107 is k
. How can we craft an input that overwrites the first byte of num
with the letter k
, and the second byte with the null terminator?
Feel free to continue playing around with this and try crafting inputs that result in other alternate return values from greet
. This method of diagnosing exactly what these stack overruns do is very powerful, and an essential part of the work you will do on the next assignment.
Fun followup reading: Peter van der Linden's book Expert C Programming: Deep C Secrets summarizes the most famous gets
exploit in "the early bird gets() the Internet Worm". The entire Expert C book is available online with your Stanford login and chock full of fascinating information -- we highly recommend it for its illuminating and comprehensive coverage of all things C! read it here (requires authentication, accesses Stanford's subscription to Safari Books Online in the left sidebar)