Lab sessions Tue May 11 to Thu May 13
Lab written by Julie Zelenski, with modifications by Nick Troccoli
Learning Goals
This lab is designed to give you a chance to:
- Become more familiar with useful GDB commands and tricks when working with assembly
- observe and understand the correct operation of the runtime stack
- diagnose symptoms of stack mismanagement
Get Started
Clone the lab starter code by using the command below.
git clone /afs/ir/class/cs107/repos/lab6/shared lab6
Note: there will be compiler warnings when you make
this starter project - this is expected for some of the exercises!
Keep our x86-64 reference sheet handy while you work!
Introduction: New GDB Commands
GDB is an indispensible tool for reverse engineering, and we can't emphasize enough how familiarity with GDB will make your life easier on assign5! Here are more tips on top of the ones introduced last time.
Printing Registers
print
with /[format]
works for registers, too, to print a value in a certain format, or you can use a C typecast:
(gdb) p $rax
$1 = 4196128
(gdb) p/t $rax
$2 = 10000000000011100100000
(gdb) p (char *)$rax
$3 = 0x400720 "Hello, world!\n"
Display
display
lets you specify an expression and it will print out what it evaluates to every time you single-step. E.g. display/2gx $rsp
will automatically print out the 16 bytes at the top of the stack each time you step in GDB. Type display
with no arguments to list all of the currently-set expressions to display, and undisplay X
to stop displaying expression X
in the list shown by display
.
Conditional Breakpoints
You can set breakpoints to only trigger when certain conditions in your code are true. For instance, say you have the following loop in your code:
1 for (int i = 0; i < count; i++) {
2 ...
3 }
If you wanted to step through the code inside the loop just the last time the loop executed, you can add a condition (in C code syntax, referencing local variables) for when the breakpoint should be stopped at:
(gdb) break 2 if i == count - 1
The format is [BREAKPOINT] if [CONDITION]
.
Customize Breakpoint Behavior
You can tell GDB to execute a certain sequence of commands each time a breakpoint is reached. For instance, maybe for one breakpoint you always want to print out a variable value there and then continue
:
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
print myVariable
continue
end
1) Stack Mechanics (10 minutes + 10min discussion)
Learning Goals: Become familiar with new GDB commands to navigate the assembly representation of a program. Stepping through a program in GDB and examining it at the assembly level is key to the kinds of security explorations you will be doing on assign5.
Starter code file: stack.c
Let's try out all these new gdb commands! First, read over the program stack.c
. Then compile the program and load it into gdb. Do each of the following steps:
break kermit
to put a breakpoint at the start of thekermit
functionrun
to run the program. It should pause right as it beginskermit
.disassemble kermit
to view the assembly for thekermit
functiondisplay/3i $rip
. From now on, every time you step, it will print out the next 3 assembly instructions to execute.x/1gx $rsp
to print out the 8 byte value at the top of the stack.
Q1: What is the significance of this 8 byte value at the top of the stack right when kermit
is called? (Hint: what does the call
instruction put there?). Try disassemble main
to see if this value pops up anywhere.
Examine the first few assembly instructions in kermit
. Notice that it pushes the values currently in %rbp
and %rbx
onto the stack. Step over these instructions with nexti
, and as you do, try p $rsp
to see the stack grow.
Q2: Why must kermit
push these registers onto the stack before it uses them?
After the call to binky
, kermit
can access the returned value in %eax
. However, later it calls dinky
, which will also put a return value into %eax
.
Q3: To where does kermit
copy binky
's return value to use it later? Step through to confirm your understanding.
Feel free to step through execution and experiment with this code as you'd like. x86 has a mere handful of registers and they are in high demand; the compiler works hard to maximize their use. Parameters and return values are passed/returned using registers, and local variables will be kept in registers whenever possible. The compiler will prefer use of "scratch registers" (e.g. callee-owned registers) where possible so as to avoid having to save/restore the caller's data (as is required if using the caller-owned registers).
2) Stack Misuses (20 minutes + 10min discussion)
Learning Goals: Experiment with how we can expose bugs with stack management by investigating assembly. This kind of investigating will be a portion of what you'll be doing on assign5.
Starter code file: smash.c
GCC always outputs proper assembly. But sometimes our C code can contain errors or vulnerabilities that cause unintended behavior, and that behavior can be most precisely understood at the assembly level.
The provided program smash.c
is one such example. It uses the standard C library function gets
, which has an inherently awful design. It is intended to read a single line of text from STDIN, stopping at the first newline character, and writing the read characters into the client's buffer. The fatal flaw of gets
is that its only argument is the starting address of the buffer, with no indication of that buffer's length. Without the length, gets
cannot tell when it should stop writing characters to avoid overflowing the buffer. Any input longer than its size will therefore write past the end. There is absolutely no way to use gets
safely. Its use has long been deprecated in favor of the properly-constrained fgets
function, but for reasons of backward compatibility, gets
lives on in the standard library.
First read the BUGS
section of man gets
for harsh words against the function and hints of the security problems therein. When you compile the smash
program, you'll get a variety of warnings from the compiler and linker that try to further dissuade you from using gets
. Let's observe the consequences if we proceed past the warnings and use the function anyway.
The smash.c
program calls the greet
function in a loop. greet
prompts the user for input using gets
, stores it in a fixed-size buffer (uh oh!), and returns an integer. At first glance, it seems like greet
can never return anything other than 2....or can it? Your hacking task in this part: come up with an input that makes greet
return 107!
Let's start by playing around with the program. Run ./smash
.
- When prompted for your name, enter
Grace Hopper
. This name fits in 16, no problem. - Now enter
Edsger W Dijkstra
. Just a little long is ok? - Now enter
Marc Tessier-Lavigne
. Definite overrun, but still getting away with it? - Push it to
John Jacob Jingleheimer Schmidt
. What? The return value changed? But how??
In order to understand stack vulnerabilities, it is essential to draw out a diagram of the stack. A stack diagram is the layout of memory for a function's stack space; where exactly local variables live. This allows us to see where they are in relation to one another, and exactly what happens if we write past space allocated for a variable.
Part 1: Stack Diagram
Let's try to diagram out what the stack looks like for the greet
function.
In a stack diagram, we want to label where local variables live. Not all local variables live on the stack (some may be just in registers); but for any that do, we want to diagram exactly where they are stored. We can use the assembly of the function to help. Let's look at the first few lines for greet
:
sub $0x28,%rsp
movl $0x2,0x1c(%rsp)
lea 0xe8c(%rip),%rdi
mov $0x0,%eax
callq 1050 <printf@plt>
mov %rsp,%rax
mov %rax,%rdi
callq 1060 <gets@plt>
Q4: In the first instruction, how much stack space is initially reserved for greet
? This is the initial size of our stack frame. %rsp points to the "top" of the stack (likely the bottom in your diagram).
Q5: In the second instruction, we copy 2, presumably into num
. Where is num
stored, relative to %rsp
? Add that to your diagram.
Q6: In the last 3 instructions, we set up the call to gets
. In the C code, we pass in buf
(a 16-byte buffer) as the parameter to gets
. Looking at the assembly, what does that tell us about where buf
is stored relative to %rsp
? Add that to your diagram.
Part 2: Stack Smash
Now let's use this stack diagram to control the program behavior. Central to this is that, for any input longer then 15 (remember the null terminator!), the characters read by gets
will overflow the buffer into the memory that follows it. The key question: how can we design an input that will overflow buf
and overwrite num
to a value that we specify?
Q7: What is the shortest input (remember the null terminator!) that we can specify that will overwrite part of num
? What is it overwritten to?
Q8: The ASCII character with numeric value 107 is k
. How can we craft an input that overwrites the first byte of num
with the letter k
, and the second byte with the null terminator?
Feel free to continue playing around with this and try crafting inputs that result in other alternate return values from greet
. This method of diagnosing exactly what these stack overruns do is very powerful, and an essential part of the work you will do on the next assignment.
Fun followup reading: Peter van der Linden's book Expert C Programming: Deep C Secrets summarizes the most famous gets
exploit in "the early bird gets() the Internet Worm". The entire Expert C book is available online with your Stanford login and chock full of fascinating information -- we highly recommend it for its illuminating and comprehensive coverage of all things C! read it here (requires authentication, accesses Stanford's subscription to Safari Books Online in the left sidebar)
[Optional] Extra Problems
Finished with lab and itching to further exercise your assembly and runtime stack skills? Check out our extra problems!
Recap
Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab6 should be understanding the gdb commands to dig around in the stack frames and being able to relate the contents of the stack to the runtime program state - both valuable when debugging. Here are some questions to verify your understanding and get you thinking further about these concepts:
- Why are the registers used for function parameters completely fixed, yet no conventions apply to where locals are stored?
- Explain the rules for caller-owned versus callee-owned register use. What is the advantage of this arrangement relative to a simpler scheme such treating all registers as caller-owned or callee-owned?
- Explain how you might go about sketching a stack diagram for a function given its assembly. <!--+ Explain how insertion/deletion of a seemingly unrelated printf statement could make a difference in a program's behavior.
- What are the signs in gdb that help you diagnose a stack overrun has trashed the return address? -->