Lab 6: Runtime Stack

Lab sessions Tue May 11 to Thu May 13

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Learning Goals

This lab is designed to give you a chance to:

Become more familiar with useful GDB commands and tricks when working with assembly
observe and understand the correct operation of the runtime stack
diagnose symptoms of stack mismanagement

Get Started

Clone the lab starter code by using the command below.

git clone /afs/ir/class/cs107/repos/lab6/shared lab6

Note: there will be compiler warnings when you make this starter project - this is expected for some of the exercises!

Keep our x86-64 reference sheet handy while you work!

Open x86-64 Reference Sheet

Introduction: New GDB Commands

GDB is an indispensible tool for reverse engineering, and we can't emphasize enough how familiarity with GDB will make your life easier on assign5! Here are more tips on top of the ones introduced last time.

Revisit GDB Commands in lab5

Printing Registers

print with /[format] works for registers, too, to print a value in a certain format, or you can use a C typecast:

(gdb) p $rax
$1 = 4196128
(gdb) p/t $rax
$2 = 10000000000011100100000
(gdb) p (char *)$rax
$3 = 0x400720 "Hello, world!\n"

Display

display lets you specify an expression and it will print out what it evaluates to every time you single-step. E.g. display/2gx $rsp will automatically print out the 16 bytes at the top of the stack each time you step in GDB. Type display with no arguments to list all of the currently-set expressions to display, and undisplay X to stop displaying expression X in the list shown by display.

Conditional Breakpoints

You can set breakpoints to only trigger when certain conditions in your code are true. For instance, say you have the following loop in your code:

1    for (int i = 0; i < count; i++) {
2        ...
3    }

If you wanted to step through the code inside the loop just the last time the loop executed, you can add a condition (in C code syntax, referencing local variables) for when the breakpoint should be stopped at:

(gdb) break 2 if i == count - 1

The format is [BREAKPOINT] if [CONDITION].

Customize Breakpoint Behavior

You can tell GDB to execute a certain sequence of commands each time a breakpoint is reached. For instance, maybe for one breakpoint you always want to print out a variable value there and then continue:

(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
print myVariable
continue
end

1) Stack Mechanics (10 minutes + 10min discussion)

Learning Goals: Become familiar with new GDB commands to navigate the assembly representation of a program. Stepping through a program in GDB and examining it at the assembly level is key to the kinds of security explorations you will be doing on assign5.

Starter code file: stack.c

Let's try out all these new gdb commands! First, read over the program stack.c. Then compile the program and load it into gdb. Do each of the following steps:

break kermit to put a breakpoint at the start of the kermit function
run to run the program. It should pause right as it begins kermit.
disassemble kermit to view the assembly for the kermit function
display/3i $rip. From now on, every time you step, it will print out the next 3 assembly instructions to execute.
x/1gx $rsp to print out the 8 byte value at the top of the stack.

Q1: What is the significance of this 8 byte value at the top of the stack right when kermit is called? (Hint: what does the call instruction put there?). Try disassemble main to see if this value pops up anywhere.

Examine the first few assembly instructions in kermit. Notice that it pushes the values currently in %rbp and %rbx onto the stack. Step over these instructions with nexti, and as you do, try p $rsp to see the stack grow.

Q2: Why must kermit push these registers onto the stack before it uses them?

After the call to binky, kermit can access the returned value in %eax. However, later it calls dinky, which will also put a return value into %eax.

Q3: To where does kermit copy binky's return value to use it later? Step through to confirm your understanding.

Feel free to step through execution and experiment with this code as you'd like. x86 has a mere handful of registers and they are in high demand; the compiler works hard to maximize their use. Parameters and return values are passed/returned using registers, and local variables will be kept in registers whenever possible. The compiler will prefer use of "scratch registers" (e.g. callee-owned registers) where possible so as to avoid having to save/restore the caller's data (as is required if using the caller-owned registers).

2) Stack Misuses (20 minutes + 10min discussion)

Learning Goals: Experiment with how we can expose bugs with stack management by investigating assembly. This kind of investigating will be a portion of what you'll be doing on assign5.

Starter code file: smash.c

GCC always outputs proper assembly. But sometimes our C code can contain errors or vulnerabilities that cause unintended behavior, and that behavior can be most precisely understood at the assembly level.

The provided program smash.c is one such example. It uses the standard C library function gets, which has an inherently awful design. It is intended to read a single line of text from STDIN, stopping at the first newline character, and writing the read characters into the client's buffer. The fatal flaw of gets is that its only argument is the starting address of the buffer, with no indication of that buffer's length. Without the length, gets cannot tell when it should stop writing characters to avoid overflowing the buffer. Any input longer than its size will therefore write past the end. There is absolutely no way to use gets safely. Its use has long been deprecated in favor of the properly-constrained fgets function, but for reasons of backward compatibility, gets lives on in the standard library.

First read the BUGS section of man gets for harsh words against the function and hints of the security problems therein. When you compile the smash program, you'll get a variety of warnings from the compiler and linker that try to further dissuade you from using gets. Let's observe the consequences if we proceed past the warnings and use the function anyway.

The smash.c program calls the greet function in a loop. greet prompts the user for input using gets, stores it in a fixed-size buffer (uh oh!), and returns an integer. At first glance, it seems like greet can never return anything other than 2....or can it? Your hacking task in this part: come up with an input that makes greet return 107!

Let's start by playing around with the program. Run ./smash.

When prompted for your name, enter Grace Hopper. This name fits in 16, no problem.
Now enter Edsger W Dijkstra. Just a little long is ok?
Now enter Marc Tessier-Lavigne. Definite overrun, but still getting away with it?
Push it to John Jacob Jingleheimer Schmidt. What? The return value changed? But how??

In order to understand stack vulnerabilities, it is essential to draw out a diagram of the stack. A stack diagram is the layout of memory for a function's stack space; where exactly local variables live. This allows us to see where they are in relation to one another, and exactly what happens if we write past space allocated for a variable.

Part 1: Stack Diagram

Let's try to diagram out what the stack looks like for the greet function.

In a stack diagram, we want to label where local variables live. Not all local variables live on the stack (some may be just in registers); but for any that do, we want to diagram exactly where they are stored. We can use the assembly of the function to help. Let's look at the first few lines for greet:

sub    $0x28,%rsp
movl   $0x2,0x1c(%rsp)
lea    0xe8c(%rip),%rdi
mov    $0x0,%eax
callq  1050 <printf@plt>
mov    %rsp,%rax
mov    %rax,%rdi
callq  1060 <gets@plt>

Q4: In the first instruction, how much stack space is initially reserved for greet? This is the initial size of our stack frame. %rsp points to the "top" of the stack (likely the bottom in your diagram).

Q5: In the second instruction, we copy 2, presumably into num. Where is num stored, relative to %rsp? Add that to your diagram.

Q6: In the last 3 instructions, we set up the call to gets. In the C code, we pass in buf (a 16-byte buffer) as the parameter to gets. Looking at the assembly, what does that tell us about where buf is stored relative to %rsp? Add that to your diagram.

Part 2: Stack Smash

Now let's use this stack diagram to control the program behavior. Central to this is that, for any input longer then 15 (remember the null terminator!), the characters read by gets will overflow the buffer into the memory that follows it. The key question: how can we design an input that will overflow buf and overwrite num to a value that we specify?

Q7: What is the shortest input (remember the null terminator!) that we can specify that will overwrite part of num? What is it overwritten to?

Q8: The ASCII character with numeric value 107 is k. How can we craft an input that overwrites the first byte of num with the letter k, and the second byte with the null terminator?

Feel free to continue playing around with this and try crafting inputs that result in other alternate return values from greet. This method of diagnosing exactly what these stack overruns do is very powerful, and an essential part of the work you will do on the next assignment.

Fun followup reading: Peter van der Linden's book Expert C Programming: Deep C Secrets summarizes the most famous gets exploit in "the early bird gets() the Internet Worm". The entire Expert C book is available online with your Stanford login and chock full of fascinating information -- we highly recommend it for its illuminating and comprehensive coverage of all things C! read it here (requires authentication, accesses Stanford's subscription to Safari Books Online in the left sidebar)

[Optional] Extra Problems

Finished with lab and itching to further exercise your assembly and runtime stack skills? Check out our extra problems!

Recap

Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab6 should be understanding the gdb commands to dig around in the stack frames and being able to relate the contents of the stack to the runtime program state - both valuable when debugging. Here are some questions to verify your understanding and get you thinking further about these concepts:

Why are the registers used for function parameters completely fixed, yet no conventions apply to where locals are stored?
Explain the rules for caller-owned versus callee-owned register use. What is the advantage of this arrangement relative to a simpler scheme such treating all registers as caller-owned or callee-owned?
Explain how you might go about sketching a stack diagram for a function given its assembly. <!--+ Explain how insertion/deletion of a seemingly unrelated printf statement could make a difference in a program's behavior.
What are the signs in gdb that help you diagnose a stack overrun has trashed the return address? -->