Assignment 6: Binary Bomb

Due: Mon Mar 4 11:59 pm
Ontime bonus 5%. Grace period for late submissions until Wed Mar 6 11:59 pm

Assignment by Michael Chang & Julie Zelenski
idea originated by Randal Bryant & David O'Hallaron (CMU). Modifications by Nick Troccoli.

Learning Goals

This assignment focuses on understanding assembly code representations of programs. You will be building your skills with:

  • reading and tracing assembly code
  • understanding how data access, control structures, and function calls translate between C and assembly
  • reverse-engineering
  • understanding the challenges of writing secure and robust systems
  • mastering the gdb debugger!

Overview

There are two parts to this assignment. The first part is about an ATM withdrawal program containing some vulnerabilities - you'll need to use your C and assembly skills to find and demonstrate how to exploit these vulnerabilities. The second part is the binary bomb program, where you're given an executable "bomb" program (no C code provided!) to "defuse" using your assembly and reverse-engineering skills. These problems are like C/assembly "puzzles" to solve, and we hope you enjoy solving them and exploring this material as much as we enjoyed creating them!

To get started on this assignment, clone the starter project using the command

    git clone /afs/ir/class/cs107/repos/assign6/$USER assign6

The starter project contains the following:

  • bomb: your binary bomb executable program, custom-generated for each student
  • custom_tests: the file where you will add custom tests to exploit vulnerabilities in the provided ATM withdrawal program
  • input.txt: a blank text file where you should add the passwords for each binary bomb level, one per line. You can run bomb with this file as a command-line argument and it will first read from this file before prompting you for further input, allowing you to avoid re-typing passwords for defused levels each time.
  • readme.txt: a file where you should add answers to some short written questions about the ATM and binary bomb programs.
  • .gdbinit: a gdb configuration file you can optionally use to run certain gdb commands each time gdb launches. See the section on using GDB in binary bomb for more information.
  • samples: a symbolic link to the shared directory for this assignment. It contains:
    • atm: the executable ATM program, which you will explore for vulnerabilities.
    • atm.c: the C source code for the ATM program, which you will explore for vulnerabilities. Note that you're not able to edit or recompile this code/executable.
    • bank: a folder containing customers.db, a file with the list of all users and balances for the ATM program
    • SANITY.INI and sanity.py: files to configure and run sanity check. You can ignore these files.
    • wordlist: a list of dictionary words used for bombs. You can ignore this file.
  • tools: contains symbolic links to the sanitycheck and submit programs for testing and submitting your work.

Do not start by running the bomb to "see what it will do". You will quickly learn that what it does is explode :-) When started, it immediately goes into waiting for input and when you enter the wrong response, it will explode and deduct points. Thoroughly read the binary bomb information in this spec before attempting to defuse it!

You will be using gdb frequently on this assignment. Make sure you have downloaded the CS107 GDB config file. You can find how to do this at the top of the CS107 GDB Guide.


1. Code Study: Security and Robustness

The samples/atm program simulates the operation of a simplified automated teller machine. The ATM program is invoked with an amount and the credentials for a particular account. If the credential is authorized and the account has sufficient funds, the amount is withdrawn and dispersed in cash. The ATM is supposed to maintain bank security by rejecting unauthorized access and denying excessive withdrawals.

Run samples/atm 40 myname (replacing myname with your myth login name) to make a $40 withdrawal from the account associated with your login name. Success! Every time you run the program, it will print out information to the terminal about the transaction that took place, or the error that occurred, if any. For example, if you ask to withdraw $100 from your account, it should be denied with an error message because that would bring your current $107 balance below the required minimum. If you try to sneak cash from another account instead of yours (e.g. samples/atm 40 troccoli) or use a fake name (e.g. samples/atm 40 not_a_user), your credential should get rejected as unauthorized. So far, so good; the ATM seems to be doing its job. (Note: Each time you run the program anew, all balances return to their original starting levels. No money actually changes hands in this ATM; which is a blessing given its security flaws.)

The bank recently updated the ATM software to a version with some additional features. The IT team reviewed the new code and thought it all looked good, but having now installed it in production, they are observing some suspicious activity. The bank has called you because your superior C and assembly skills are just what's needed to investigate and resolve these problems!

Your first task is to review the source code for the program in samples/atm.c. The program is roughly 150 lines of C code of similar complexity to what you have been writing this quarter, and is decomposed and fairly readable, though sorely lacking in comments. You should find that the program's approach seems reasonable and the code is sincere in its attempt to operate correctly. Once you're done reading, take a minute to reflect on how far your awesome C skills have come to let you read through this provided program!

By following program output and balances, the bank has noticed three operational anomalies that they need your help investigating.

Deliverables

For each of the vulnerabilities below, construct a test case to showcase how it can be exploited and add it to your custom_tests file. Note that there may be more than one way to trigger a vulnerability. In your readme.txt, you should also provide for each a concise description of the underlying defect in the code, an explanation of exactly how you constructed your test case to exploit it, and your recommendation for fixing it. The bank is not looking for a major rewrite/redesign, so in your proposed changes you should directly address the vulnerability with minimal other disruption. Note that there may be more than one possible remedy for fixing each issue. Here is a list of the attacks you must provide:

  • case a: make a withdrawal as yourself that withdraws more money than is present in your account
  • case b: withdraw $40 from one of the CS107 staff member's accounts
  • case c: withdraw $300 from the bank vault despite its disabled passcode

a) Negative Balances

The old version of the ATM program restricted a withdrawal to be at most the full account balance, allowing the customer to drain their account to $0, but no further. The new program has changed the withdraw function to require a non-zero minimum balance. The expected behavior should be that all account balances stay above this minimum. However, the bank saw an otherwise ordinary withdrawal transaction that not only caused an account to go below the minimum, it overdrew so far as to end up with a negative balance. Oops, that's definitely not supposed to happen! Review the C code for the withdraw function, specifically the changes from the old version. It seems to work in many cases, but apparently not all. Read carefully through this function to try and discover the flaw - your understanding of signed and unsigned integers will be useful here! Once you have found the vulnerability, determine a command to make a withdrawal as yourself that withdraws more money than is present in your account.

b) Unauthorized Account Access

The bank has also received a customer complaint about an unauthorized withdrawal from their account. It seems that another user with different credentials was able to successfully withdraw money from the aggrieved customer's account. Moreover, the credential used appears to be entirely fake - no such user exists in the database! A user should not be able to access a different customer's account and especially not by supplying a bogus credential! Review the C code for the find_account function that is responsible for matching the provided username to their account number. It seems to work properly for valid accounts, but not for invalid usernames. Can you spot what this function does in this case? Once you do, it may seem that this function will behave unpredictably in this case. Your next task is to examine the generated assembly to determine precisely how the function will behave. Think about registers with special responsibilities and where it's assumed certain values will live. Once you have found the vulnerability, determine a command with a designed bogus name credential to withdraw $40 from one of the CS107 staff member's accounts. (The samples/bank/customers.db file contains information about all valid users and their balances).

c) Accessing The Master Vault

The most worrisome issue is repeated illicit withdrawals from the master vault account, account number 0. The name on the master account is not an actual user, so this account cannot be accessed using the simple username-based credential. Instead, the user must specify two arguments, the account's number and its secret passcode, as a form of heightened security. At first the bank thought the vault passcode had been leaked, but changing the passcode did nothing to thwart the attack. In a fit of desperation, the bank removed the vault passcode file altogether, figuring this would disable all access to the vault, yet the rogue user continues to make withdrawals from it! It seems that the high-security passcode authentication may have its own security flaw! The code that handles this authentication is in the lookup_by_number and read_secret_passcode functions. These functions work correctly in many situations, but fail in certain edge cases. Remember that it seems that in certain cases supplied credentials are accepted despite the lack of a saved passcode file. The vulnerability is subtle in the C code, so you should also use GDB to examine the code at the assembly level and diagram out the memory on the stack for these functions, as it is the arrangement of the various data and lack of care in accessing the stack-based variables that leads to the security vulnerability in this case. Once you have found the vulnerability, determine a command to withdraw $300 from the bank vault despite its disabled passcode.

Optional Further Exploration

During the course of your investigation, you may find additional problems beyond the ones listed above. You are only required to address these three issues, but you are welcome to explore further to find additional problems!

NOTE: If you liked this exercise, you'll love CS155, a class on computer security that challenges you to exploit various vulnerabilities in programs. See the CS155 website for more information!


2. Binary Bomb

Those nefarious Cal students have broken into our myth machines and planted some mysterious executables we are calling "binary bombs." These programs are believed to be armed and dangerous. Without the original source, we don't have much to go on, but we have observed that the programs seem to operate in a sequence of levels. There are 4 levels in total. Each level challenges the user to enter a string. If the user enter the correct string, it defuses the level and the program proceeds on. But given the wrong input, the bomb explodes by printing an earth-shattering KABOOM! and terminating. To deactivate the entire bomb, one needs to successfully defuse each of its levels.

The Cal students have littered our systems with these landmines and we need your help. Each of you is given a bomb to disable. Your mission is to apply your best assembly detective skills to work out the input required to pass each level and render the entire bomb harmless.

Your bomb is given to you as an executable, i.e. as compiled object code. From the assembly, you will work backwards to construct a picture of the original C source in a process known as reverse-engineering. Note that you don't necessarily need to recreate the entire C source; your goal is to work out a correct input to pass the level, which requires a fairly complete exploration of the code path you follow to defuse, but any code outside that path can be investigated on a need-to-know basis. Once you understand what makes your bomb "tick", you can supply each level with the input it requires and defuse it. The levels get progressively more complex, but the expertise you gain as you move up from each level increases as well. One confounding factor is that the bomb explodes whenever it is given invalid input. Each time your bomb explodes, it notifies the staff, which deducts from your score. Thus, there are consequences to detonating the bomb-- you must tread carefully!

Reverse-engineering requires a mix of different approaches and techniques and will give you an opportunity to practice with a variety of tools, most importantly GDB. Building a well-developed gdb repertoire can pay big dividends the rest of your career!

Logistics

Our counter-intelligence efforts been able to confirm a few things about how the bombs operate:

  • If you start the bomb with no command-line argument, it reads input typed at the console.
  • If you give an argument to the bomb, such as input.txt:

    ./bomb input.txt
    

    the bomb will read lines from that file until it reaches EOF (end of file), and then switch over to reading from the console. This feature allows you to store inputs for solved levels in input.txt and avoid retyping them each time.

  • Explosions can be triggered when executing at the shell or within gdb. However, gdb offers you tools you can use to intercept explosions, so your safest choice is to work under gdb and employ protective measures.

  • The bomb in your repository was lovingly created just for you and is unique to your id. It is said that the bomb can detect if an impostor attempts to execute your bomb and won't play along.
  • The bombs are designed for the myth computers (running on the console or logged in remotely). There is a rumor that the bomb will refuse to run anywhere else.
  • The bombs were compiled from C code using gcc. Apparently Cal students don't know how to edit a Makefile to change the flags to achieve much obfuscation of the object code.
  • The Cal students also weren't aware the function names would be visible in the object code, so they didn't take pains to disguise them. Thus, a function name of initialize_bomb or read_five_numbers can be a clue. Similarly, they played it straight with use of the standard C library functions, so if you encounter a call to qsort or sscanf, it is the real deal.
  • Direct modification of the binary bomb executable can change its behavior, but be forewarned that we will test your submission against your original unmodified binary, so while hacking the executable is great fun, it won't be of much use as a strategy for solving the levels.
  • There is one important restriction: Do not use brute force! You could write a program to try every possible input to find a solution. But this is trouble for several reasons:

    • You lose points on every incorrect guess which explodes the bomb.
    • A notification is sent on each bomb explosion. Wild guessing will saturate the network, creating ill will among other users and attracting the ire of the system administrators who have the authority to revoke your privileges because you are abusing shared resources.
    • We haven't told you how long the strings are, nor have we told you what characters they can contain. Even if you made the (wrong) assumptions that they all are less than 80 characters long and only contain lowercase letters, you will have 2680 guesses for each level. Trying them all will take an eternity, and you will not have an answer before you graduate.
    • Part of your submission requires answering questions that show your understanding of the assembly code, which guessing will not provide. :-)

Getting Started

Here are some steps you should take to get started on this part of the assignment.

  1. Use the nm utility on the executable (nm bomb) to print what's called the "symbol table" of the executable. The symbol table contains the names of functions and global variables and their addresses. The names may give you a sense of the structure of the bomb.
  2. Use the strings utility on the executable (strings bomb) to print all the printable strings contained in the executable, including string constants. See if any of these strings seem relevant in defusing the bomb.
  3. gdb and objdump will be most helpful after this. objdump -d bomb outputs the assembly for the bomb executable. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex. That is where gdb comes in.
  4. gdb lets you single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints, and more. Live experimentation on the executing bomb is the most direct way to become familiar in what's happening at the assembly level.
  5. pull up tools like the Compiler Explorer interactive website from lab6, or gcc on myth, to compile and explore the assembly translation of any code you'd like. For example, if you're unsure how to a particular C construct translates to assembly, how to access a certain kind of data, how break works in assembly, or how a function pointer is invoked by qsort, write a C program with the code in question and trace through its disassembly. Since you yourself wrote the test program, you also don't have to fear its explosive nature :-) You can compile directly on myth using a copy of a Makefile from any CS107 assignment/lab as a starting point, and then use gdb or objdump to poke around.

Before attempting to defuse the bomb, you should use the above tools, and gdb tricks below, to figure out how to reliably prevent explosions. There are simple manual blocks that give some measure of protection, but it is best to go further to develop an invincible guard. Feel free to use any technique at your disposal, such as leveraging gdb features, tweaking the global program state, modifying your setup, tricking the bomb into running in a safe manner, or hacking the bomb executable. Avoiding the entire explosion is one straightforward approach to ensure that we won't hear about it, but there are ways to selectively disable just the transmission portion to the course staff. Once you figure how to set up appropriate protection against explosions, you will then be free to experiment with the levels without worry. Note that the bomb can only explode when it is "live", i.e., executing in shell or running with gdb. Using tools such as nm, strings, and objdump to examine the executable cannot explode the bomb.

Using gdb

The debugger is absolutely invaluable on this assignment. Here are some suggestions on how to maximize your use of gdb. You'll also get more practice with gdb tricks in labs 6 and 7.

  • Expand your gdb repertoire. The labs have introduced you to handy commands such as break, x, print, info, disassemble, and stepi/nexti. Here are some additional commands that you might find similarly useful: display, set variable, watch, jump, kill, and return. Within gdb, you can use help name-of-command to get more details about any gdb command. See the quick gdb reference card for a summary of many other neat gdb features.
  • Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. Use commands to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on. Breakpoint commands are particularly useful for installing actions you intend to be automatically and infallibly completed when arriving at a certain place in the code. (hint!)

    gdb kill workaround: gdb 7.7 (current version on myth as of 11/2017) has a bug when attempting to use kill in the commands sequence for a breakpoint that creates a cascade of problems --can cause gdb itself to crash or hang. The gdb command signal SIGKILL can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.

  • Use a .gdbinit file. The provided file named .gdbinit in the assignment folder can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. Upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the bomb? The .gdbinit file we give you in the starter repo has only one command to echo Successfully executing commands from .gdbinit in current directory. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded.

  • Custom gdb commands. Use define to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit file so you have access to them in subsequent gdb sessions as well.
  • Fire up tui mode (maybe...). The command layout asm followed by layout reg will give you a split window showing disassembly and register values. This layout will display current values for all registers in the upper pane, the sequence of assembly instructions in the middle pane, and your gdb command line at the bottom. As you single-step with si, the register values will update automatically (those values that changed are highlighted) and the middle pane will follow instruction control flow. This is a super-convenient view of what is happening at the machine level, but sadly, you have to endure a number of quirks and bugs to use it. The tui mode can occasionally crash gdb itself, killing off gdb and possibly the bomb while it's at it. Even when tui is seemingly working, the display has a habit of turning wonky, often fixable by the refresh command (use this early and often!) but not always. A garbled display could cause you to misunderstand the program state, misidentify where your bomb is currently executing, or accidentally execute a gdb command you didn't intend. Any explosion suppression mechanism that requires you, the fallible human, to take the right action at a critical time could easily be waylaid by interference, so don't attempt tui before you have invincible automatic protection against explosions. Selective use of auto-display expressions (introduced in lab7) is a great alternative with less disruption. You can exit tui using ctrl-x a and re-enter it again (this doesn't require leaving gdb and losing all your state).

Bomb Deliverables

You should add the passwords to defuse each level in your input.txt file. We will test by running ./bomb input.txt on your submission. The input.txt file in your submission should contain one line for each level you have solved, starting from level 1. Malformed entries in your input.txt or wrong line-endings (see FAQ below) will cause grading failures. To avoid surprises, be sure that you have verified your input.txt in the same way we will in grading (i.e. ./bomb input.txt). We also have a few follow-up questions that you should answer in your readme.txt file:

  1. What tactics did you use to suppress/avoid/disable explosions?
  2. level_1 contains an instruction of the form mov $<hex>,%edi. Explain how this instruction fits into the operation of level_1. What is this hex value and for what purpose is it being moved? Why can this instruction reference %edi instead of the full %rdi register?
  3. level_2 contains a jle that is not immediately preceded by a cmp or test instruction. Explain how a branch instruction operates in such a context. Under what conditions is this particular jle branch taken?
  4. Explain how the loop in the winky function of level_3 is exited.
  5. The read_array function used in level_4 declares a local variable that is stored on the stack at 0x8(%rsp). What is the type/size of this variable? Explain how can you discern its type from following along in the assembly, even though there is no explicit type information in the assembly instructions. Within read_array there is no instruction that writes to this variable. Explain how the variable is initialized (what value it is set to and when/where does that happen?).
  6. Explain how the mycmp function is used in level_4. What type of data is being compared and what ordering does it apply?

Sanity Check

The default sanitycheck test cases are ATM inputs and one test case that reports the line count of your input.txt file. This sanitycheck is configured to only allow test cases for ATM in your custom_tests file. The bomb executable is not run by sanitycheck.

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.

Grading

For this assignment, here is a tentative point breakdown (out of 85):

  • custom_tests (15 points) Each successful attack test case earns 5 points. We will test by running tools/sanitycheck custom_tests on your submission. Your custom_tests should contain 3 test cases, one for each ATM attack.
  • readme.txt (35 points) The ATM and bomb questions will be graded on the understanding of the issues demonstrated by your answers and the thoroughness and correctness of your conclusions.
  • Input.txt (32 points) Each bomb level you have solved earns 8 points. We will test by running ./bomb input.txt on your submission. The input.txt file in your submission should contain one line for each level you have solved, starting from level 1. Malformed entries in your input.txt or wrong line-endings (see FAQ below) will cause grading failures. To avoid surprises, be sure that you have verified your input.txt in the same way we will in grading (i.e. ./bomb input.txt).
  • Bomb explosions (up to 6 points deducted) Each bomb explosion notification that reaches the staff results in a 1 point deduction, capped at 6 points total.

On-time Bonus (+5%) The on-time bonus for this assignment is 5%. Submissions received by the due date earn the on-time bonus. If you miss the due date, late work may be submitted during the grace period without penalty. No submissions will be accepted after the grace period ends, please plan accordingly! The bonus is calculated as a percentage of the point score earned by the submission. See the General Information handout for more information about the late policy.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, your assembly skills will be unstoppable! You successfully found vulnerabilities in a program using its source and assembly, and reverse engineered a complex program without having access to its source at all. Rock on!

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.

  • What are some of the gdb commands that allow re-routing control in an executing program?
  • What is the main indication that an assembly passage contains a loop?
  • Explain the difference between a function's return value and its return address.
  • Consider the mechanics of function pointer work at the assembly level. How is a call through a function pointer the same/different when compared to an ordinary function call?
  • For performance reasons, the compiler prefers storing local variables in registers whenever possible. What are some reasons that force the compiler to store a local variable on the stack instead?
  • For the instruction sequence below, what must be true about values of op1 and op2 for the branch to be taken? What changes if ja is substituted for jg?
    cmp op1,op2 
    jg target
    

We would also appreciate if you filled out this homework survey to tell us what you think. We appreciate your feedback!


Frequently Asked Questions

I get an error message about auto-loading .gdbinit being declined when starting gdb. What does this mean?

There is a provided .gdbinit file in the assignment starter code that is helpful for auto-executing gdb commands on launch when defusing your binary bomb. GDB loads the file automatically if it's in that directory. If you are seeing an error message, this means that you haven't installed the CS107 GDB configuration file to permit GDB to load this assignment file - you can find instructions for how to do so on the CS107 GDB Guide page.

My input defuses the level when typed manually, but when I added the same input to input.txt, it explodes. What gives?

When testing on input.txt, we advise you do so with your explosion defense in place against possible editing glitches. The contents of input.txt should consist of the input for each level on its own line and each line should end with a standard Unix newline. Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Look carefully for extraneous leading/trailing spaces or mismatched line endings. Emacs uses the correct line endings (\n) by default. Editors on other platforms that are using the line-ending conventions for Mac (\r) or Windows (\r\n) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using Emacs on myth.

I found some other assembly reference material that seems syntactically/logically inconsistent with the assembly from our textbook/lecture/tools. What's up?

The gnu tool chain defaults to the att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parentheses, and so on. For example, the att instruction push %rbp is written as push RBP in Intel and att movl $1, (%rsp) becomes movl [RSP], 1. Translating between them can be confusing, so it's recommended that you stick to resources that use the same syntax as our tools/text.

How do I print register values in gdb?

The gdb command info reg will show the current value for all registers. You can also access individual register values for use in gdb commands such as print, examine, or display. The register names are prefixed by dollar sign in gdb. A register value is treated as void*; you can apply a typecast to change the interpretation. Some examples:

(gdb) p/t $rax           # print %rax, binary
(gdb) p (char *)$rax     # print %rax, interpret as char*
(gdb) x/2wd $rax         # examine memory (deref %rax), show 2 ints
(gdb) display/2gx $rsp   # auto-print 2 quadwords from stack top in hex

To use the register value in a larger expression, be sure to use C syntax, not assembly. For example, if you need to dereference a register, apply *, not wrap in parentheses. If you ask gdb to evaluate an expression in assembly syntax, it handles it fairly oddly:

(gdb) p ($rax)               # parens ignored, ($rax) same as $rax
(gdb) p 0x8($rsp)            # gdb will segfault on this

Instead use C syntax, including typecast where necessary.

(gdb) p *(long *)$rax  
(gdb) p *(long *)((char *)$rsp + 8)

The disassembly shows %eax being set to 0 before certain function calls. What's with that?

Variable argument functions (e.g printf and scanf variants) require a little extra setup relative to normal calls. The x86-64 calling conventions for variable argument functions must indicate presence of any float/double arguments by setting %rax to the count of vector registers used. If none are used (i.e. no parameters of float/double type), it sets %rax to zero.