Assignment 5: Banking on Security

Due: Fri Aug 13 11:59 pm
Late submissions accepted until Sun Aug 15 11:59 pm

Assignment by Michael Chang & Julie Zelenski
idea originated by Randal Bryant & David O'Hallaron (CMU). Modifications by Nick Troccoli, Brynne Hurst, and Kathleen Creel.

Learning Goals

This assignment focuses on understanding assembly code representations of programs. You will be building your skills with:

reading and tracing assembly code
understanding how data access, control structures, and function calls translate between C and assembly
reverse-engineering
understanding the challenges of writing secure and robust systems
understanding privacy, trust, and the role of the ethical penetration tester
mastering the gdb debugger!

Overview

You have been hired as a security expert for Stanford Bank (a fictional on-campus bank). They need you to investigate reports of infiltration and security issues and replicate the issues so that they can fix them.

There are three parts to this assignment, each of which can be completed independently:

an ATM withdrawal program containing some vulnerabilities - you'll need to use your C and assembly skills to find and demonstrate how to exploit these vulnerabilities.
A dataset that you will use to deanonymize bank users.
The SecureVault program, a new product designed by the bank to provide increased security to the master vault. You'll be given an executable of the SecureVault program (no C code provided!) to show that it is possible to reverse engineer this program and break into the master vault without being told the passwords.

These problems are like C/assembly "puzzles" to solve, and we hope you enjoy solving them and exploring this material as much as we enjoyed creating them!

To get started on this assignment, clone the starter project using the command

    git clone /afs/ir/class/cs107/repos/assign5/$USER assign5

The starter project contains the following:

vault: Your SecureVault executable program, custom-generated for each student.
custom_tests: The file where you will add custom tests to reproduce vulnerabilities in the provided ATM withdrawal program.
input.txt: A blank text file where you should add the passwords for each SecureVault level, one per line. See the section on SecureVault for more information.
readme.txt: A file where you should add answers to short written questions for all three parts of the assignment.
.gdbinit: A gdb configuration file you can optionally use to run certain gdb commands each time gdb launches. See the section on using GDB in SecureVault for more information.
samples: A symbolic link to the shared directory for this assignment. It contains:
- atm: The executable ATM program, which you will explore for vulnerabilities.
- atm.c: The C source code for the ATM program, which you will explore for vulnerabilities. Note that you're not able to edit or recompile this code/executable.
- checkins.csv: A file containing public social media location check-in data for various locations on Stanford campus over the past three months.
- search_checkins: An executable program to search the check-in data.
- bank: a folder containing the following:
  - customers.db: A file with the list of all users and balances for the ATM program.
  - transactions.csv: A file with ATM transaction information from the past three months at the Stanford campus ATM.
- minivault and minivault.c: A sample "practice" executable you can work on with others if you want practice material similar to what SecureVault is like. You can check your answers with the .c file.
- SANITY.INI and sanity.py: Files to configure and run sanity check. You can ignore these files.
- wordlist: A list of dictionary words used for SecureVault.
tools: Contains symbolic links to the sanitycheck and submit programs for testing and submitting your work. (codecheck is not needed on this assignment)

Do not start by running SecureVault and entering passwords to "see what will happen". You will quickly learn that what happens is the alarm goes off and it deducts points :-) When started, SecureVault waits for input and when you enter the wrong password, it will raise the alarm and notify the central system, deducting points. Thoroughly read the SecureVault information below before attempting to enter any passwords!

You will be using gdb frequently on this assignment. Here are essential resources as you work - note that you should make sure you have downloaded the CS107 GDB configuration file mentioned in the Getting Started Guide if you didn't previously do so.

Open Getting Started Guide
Open GDB Guide
Open Lab5 GDB Tips
Open Lab6 GDB Tips

Please make sure to adhere to the honor code and collaboration policy for this assignment. Even without any code being submitted, you should not be doing any joint debugging/development, sharing or copying written answers, sharing specific details about SecureVault behavior, etc.

1. ATM Security

The samples/atm program simulates the operation of a simplified automated teller machine (ATM). The ATM program is invoked with an amount and the credentials for a particular account. If the credential is authorized and the account has sufficient funds, the amount is withdrawn and dispersed in cash. The ATM is supposed to maintain bank security by rejecting unauthorized access and denying excessive withdrawals.

Run samples/atm 40 myname (replacing myname with your myth login name) to make a $40 withdrawal from the account associated with your login name. Success! Every time you run the program, it will print out information to the terminal about the transaction that took place, or the error that occurred, if any. For example, if you ask to withdraw $100 from your account, it should be denied with an error message because that would bring your current $107 balance below the required minimum of $50. If you try to sneak cash from another account instead of yours (e.g., samples/atm 40 adbenson) or use a fake name (e.g., samples/atm 40 not_a_user), your credential should get rejected as unauthorized. So far, so good: the ATM seems to be doing its job. (Note: Each time you run the program anew, all balances return to their original starting levels. No money actually changes hands in this ATM, which is a blessing given its security flaws.)

Stanford Bank recently updated the ATM software to a version with some additional features. The IT team reviewed the new code and thought it all looked good, but having now installed it in production, they are observing some suspicious activity. The bank has called you because your superior C and assembly skills are just what's needed to investigate and resolve these problems!

Your first task is to review the (read-only) source code for the program in samples/atm.c. The program is roughly 175 lines of C code of similar complexity to what you have been writing this quarter, and is decomposed and fairly readable, though sorely lacking in comments. You should find that the program's approach seems reasonable and the code is sincere in its attempt to operate correctly. Once you're done reading, take a minute to reflect on how far your awesome C skills have come to let you read through this provided program!

By examining program output and balances, the bank has noticed three operational anomalies that they need your help investigating.

Note on Deliverables

For each of the anomalies (a), (b), and (c) below, you will be asked to construct a test case to showcase how to reproduce the vulnerability. Note that there may be more than one way to trigger a vulnerability.

For each vulnerability, we also ask you to provide the following in the readme.txt:

A concise description of the underlying defect in the code.
An explanation of exactly how you constructed your test case to exploit it.
Your recommendation for fixing it.

The bank is not looking for a major rewrite/redesign, so in your proposed changes you should directly address the vulnerability with minimal other disruption. Note that there may be more than one possible remedy for fixing each issue. Also make sure you do not remove intended functionality of the bank program, and account for any potential additional security issues introduced by your proposed fix.

Make sure your custom_tests file is formatted correctly. In particular, comments must be on their own line, and comment lines must start with #. Each test case line should start with samples/atm. Make sure to run your custom tests before submitting to confirm they execute properly.

Running ATM in GDB: to run the ATM program in GDB, make sure you go into the samples folder first and then run the atm program from there.

a) Negative Balances

The old version of the ATM program restricted a withdrawal to be at most the full account balance, allowing the customer to drain their account to $0, but no further. The new program has changed the withdraw function to require a non-zero minimum balance. The expected behavior should be that all account balances stay above this minimum. However, the bank saw an (otherwise ordinary) withdrawal transaction that not only caused an account to go below the minimum, but also overdrew so far as to end up with a negative balance. Oops, that's definitely not supposed to happen! Review the C code for the withdraw function, specifically the changes from the old version. It seems to work in many cases, but apparently not all. Read carefully through this function to try and discover the flaw - your understanding of signed and unsigned integers will be useful here! Once you have found the vulnerability, determine a command to make a withdrawal as yourself that withdraws more money than is present in your account. Put this command in custom_tests, and answer the specified readme questions.

b) Unauthorized Account Access

The bank has also received a customer complaint about an unauthorized withdrawal from their account. It seems that another user with different credentials was able to successfully withdraw money from the aggrieved customer's account. Moreover, the credential used appears to be entirely fake - no such user exists in the database! A user should not be able to access a different customer's account and especially not by supplying a bogus credential! Review the C code for the find_account function that is responsible for matching the provided username to their account number. It seems to work properly for valid accounts, but not for invalid usernames. Can you spot what this function does in this case? Once you do, it may seem that this function will behave unpredictably in this case. Your next task is to examine the generated assembly to determine precisely how the function will behave. Think about registers with special responsibilities and where it's assumed certain values will live. Once you have found the vulnerability, determine a command with a designed bogus name credential to withdraw $40 from one of the CS107 staff member's accounts. Put this command in custom_tests, and answer the specified readme questions. (The samples/bank/customers.db file contains information about all valid users and their balances, and the first 11 users in the database are staff accounts.)

c) Accessing The Master Vault

The most worrisome issue is repeated illicit withdrawals from the master vault account, account number 0. The name on the master account is not an actual user, so this account cannot be accessed using the simple username-based credential. Instead, the user must specify two arguments, the account's number and its secret passcode, as a form of heightened security, like this: samples/atm 40 [ACCOUNT NUM] [PASSWORD]. At first the bank thought the vault passcode had been leaked, but changing the passcode did nothing to thwart the attack. In a fit of desperation, the bank removed the vault passcode file altogether, figuring this would disable all access to the vault, yet the rogue user continues to make withdrawals from it! It seems that the high-security passcode authentication may have its own security flaw! The code that handles this authentication is in the lookup_by_number and read_secret_passcode functions. These functions work correctly in many situations, but fail in certain edge cases. Remember that it seems that in certain cases supplied credentials are accepted despite the lack of a saved passcode file. The vulnerability is subtle in the C code, so you should also use GDB to examine the code at the assembly level and diagram out the memory on the stack for these functions, as it is the arrangement of the various data and lack of care in accessing the stack-based variables that leads to the security vulnerability in this case. This problem is similar to the stack diagramming problem from lab6 - revisit that problem if you need a refresher! Your exploit should not involve reading from any file. Once you have found the vulnerability, determine a command to withdraw $300 from the bank vault despite its disabled passcode. Put this command in custom_tests, and answer the specified readme questions.

2. Dataset Aggregation

NOTE: Before beginning this part, you should watch the 2 short posted videos on Canvas in the Assignment 5 folder covering discussions of trust and privacy. These videos are needed to answer some of the questions in this part of the assignment.

Separate from the faulty ATM software, Stanford Bank believes that someone was able to gain access to their account logs and get a list of ATM transaction information for their Stanford campus ATM. The company believes that this poses little threat because the transaction logs have limited recorded data. However, you are concerned that this data can be combined with other available data in dangerous ways, such as to learn private information. For instance, knowing someone's history of large (or small) transactions might tell you about their financial situation; knowing memberships in clubs or organizations might tell you about social relationships and webs of networks. Your task is to combine this data with another dataset you have found of public location check-ins to show the harms of a potential data breach. To aid in investigating your concerns, the bank has made the ATM transaction data available to you in the samples/bank/transactions.csv file. This file has one account transaction per line, and each transaction occurred at the Stanford campus ATM. Each line has the following format:

ACCOUNT IDENTIFIER, TRANSACTION TIMESTAMP, TRANSACTION TYPE, TRANSACTION AMOUNT

For example, here is one line from the file that represents a withdrawal of $15 on 2/15/21 at 4:54PM:

d67c6a0e6cc5fdede02a7932d4e3401d3b4649d25465f89b205bf556a07ae721,2021-02-15 04:54:50 PM,Withdrawal,15

Transactions with the same account identifier are guaranteed to be for the same bank account, but the identifier doesn't give any information about whose account it is (intentionally done by the bank to obfuscate the data).

You have already downloaded a publicly-available location checkins dataset from an online social network, in the file checkins.csv. It is too large to read through manually, so you also already created a program search_checkins that displays that checkin data and lets you search through it more easily. Run the program (samples/search_checkins) for instructions on how to use it.

Show the risks of dataset aggregation and express your concerns to the bank managers by answering the following questions in your readme.txt. Note that you are not expected to create any additional programs to parse or otherwise process these datasets with code - the intent is for you to skim the transactions.csv file by hand and use it along with the search_checkins program to answer the following questions.

What are the names of the following users?
- a) The likely user who made multiple large transactions?
- b) Two (there may be more, but you must identify only two) likely members of the Stanford SecurityStars Club, which has a club meeting on the 15th of each month where people must bring $15 to pay their membership dues? (Assume they are procrastinators in withdrawing the money)
How were you able to de-anonymize the transactions data?
What recommendations would you give to Stanford Bank to further anonymize the account data or otherwise protect it in the case of accidental data breaches?
Use one or more of the four models of privacy discussed in the assignment videos to explain why disclosure of the information that can be gathered here is (or is not) a violation of privacy.

3. SecureVault

Stanford Bank is rolling out a new tool, SecureVault, to provide increased security at the master vaults at each of their branches. Employees must enter four secret passwords into this program to gain access to the master vault. For extra security, the bank creates a different SecureVault program for each branch with different expected passwords; the bank headquarters does not give the source code to any of the branches; and the program triggers an alarm that notifies the central system each time an incorrect password is entered. They are confident that this means only someone who is told the password can get access, and any potential intruders will be detected by the alarm system. They have hired you to test this. Your task is to show that you can reverse engineer the program to gain access to the bank vault without being told the password, and without alerting central security.

Without the original source, all you know is that SecureVault has four "levels" of security, each with a different password. If the user enters the correct string, it deactivates the level and the program proceeds on. But given the wrong input, SecureVault raises an alarm by printing a message, alerting central security and terminating. To reach the master vault, one needs to successfully disarm each of its levels.

This is where the bank needs your help. Each of you is assigned a different generated SecureVault program unique to you, generated just as they would be for each bank branch. Your mission is to apply your best assembly detective skills to work out the input required to pass each level and reach the master vault, thus proving the insecurity of the bank's approach.

Your SecureVault is given to you as an executable, i.e., as compiled object code. From the assembly, you will work backwards to construct a picture of the original C source in a process known as reverse-engineering. Note that you don't necessarily need to recreate the entire C source; your goal is to work out a correct input to pass the level. This requires a fairly complete exploration of the code path you follow to deactivate the level, but any code outside that path can be investigated on a need-to-know basis. Once you understand what makes your SecureVault program "tick", you can supply each level with the password it requires to disarm it. The levels get progressively more complex, but the expertise you gain as you move up from each level increases as well. One confounding factor is that SecureVault raises an alarm whenever it is given invalid input. Each time the alarm goes off, it notifies central security (the CS107 staff) and points are deducted from your score. Thus, there are consequences to setting off the alarm -- you must be careful!

Reverse-engineering requires a mix of different approaches and techniques and will give you an opportunity to practice with a variety of tools, most importantly GDB. Building a well-developed gdb repertoire can pay big dividends the rest of your career!

Practice: Minivault

Want to get a feel for what SecureVault is like, but in a more practice-friendly setting? The minivault program in the samples/ folder is a practice executable we created that is similar in spirit to your SecureVault (it doesn't share any code with your SecureVault, it's just made with a similar reverse-engineering goal in mind). You can practice working on that, work together with other students, and check your answers with the included source code for the minivault. For the minivault, you must get past two stages, stage 1 and stage 2. Stage 1 is a function called stage1 that is passed 1 parameter, which is the first command line argument. Stage 2 is a function called stage2 that is passed 1 parameter, which is the second command line argument. E.g., you run samples/minivault [stage1password] [stage2password]. Your goal is to get both functions to return 1, and not 0. The minivault is completely optional, but we encourage you to use it as a practice tool if you'd like!

Logistics

The bank has confirmed to you a few things about how the SecureVault programs operate:

If you start SecureVault with no command-line argument, it reads input typed at the console.
If you give an argument to SecureVault, such as input.txt:
```
./vault input.txt
```
SecureVault will read lines from that file until it reaches EOF (end of file), and then switch over to reading from the console. This feature allows you to store inputs for solved levels in input.txt and avoid retyping them each time.
Alarms can be triggered when executing at the shell or within gdb. However, gdb offers you tools you can use to intercept the alarms, so your safest choice is to work under gdb and employ preventive measures.
It is not possible to know for sure whether the central system (course staff) is notified about an alarm. You must use your investigative skills and best defensive measures!
The SecureVault program in your repository was lovingly created just for you and is unique to your id. It is said that it can detect if an impostor attempts to run it and won't play along.
The SecureVault program is designed for the myth computers (running on the console or logged in remotely). There is a rumor that it will refuse to run anywhere else.
The SecureVault program was compiled from C code using gcc. It seems it was created without changing the compile flags to achieve much obfuscation of the object code.
It seems as though the function names were left visible in the object code, with no effort to disguise them. Thus, a function name of initialize_vault or read_five_numbers can be a clue. Similarly, it seems to use the standard C library functions, so if you encounter a call to qsort or sscanf, it is the real deal.
Direct modification of the SecureVault executable can change its behavior, but be forewarned that we will test your submission against your original unmodified binary, so while hacking the executable is great fun, it won't be of much use as a strategy for solving the levels.
There is one important restriction: Do not use brute force! You could write a program to try every possible input to find a solution. But this is trouble for several reasons:
- You lose points on every incorrect guess which raises an alarm.
- A notification is sent on each alarm. Wild guessing will saturate the network, creating ill will among other users and attracting the ire of the system administrators who have the authority to revoke your privileges because you are abusing shared resources.
- We haven't told you how long the strings are, nor have we told you what characters they can contain. Even if you made the (wrong) assumptions that they all are less than 80 characters long and only contain lowercase letters, you will have 26⁸⁰ guesses for each level. Trying them all will take an eternity, and you will not have an answer before you graduate.
- Part of your submission requires answering questions that show your understanding of the assembly code, which guessing will not provide. :-)

Getting Started

Here are some steps you should take to get started on this part of the assignment.

Use the nm utility on the executable (nm vault) to print what's called the "symbol table" of the executable. The symbol table contains the names of functions and global variables and their addresses. The names may give you a sense of the structure of the SecureVault program.
Use the strings utility on the executable (strings vault) to print all the printable strings contained in the executable, including string constants. See if any of these strings seem relevant in determining the passwords.
gdb and objdump will be most helpful after this. objdump -d vault outputs the assembly for the SecureVault executable. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex. That is where gdb comes in.
gdb lets you single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints, and more. Live experimentation on the executing SecureVault program is the most direct way to become familiar in what's happening at the assembly level.
pull up tools like the Compiler Explorer interactive website from lab, or gcc on myth, to compile and explore the assembly translation of any code you'd like. For example, if you're unsure how to a particular C construct translates to assembly, how to access a certain kind of data, how break works in assembly, or how a function pointer is invoked by qsort, write a C program with the code in question and trace through its disassembly. Since you yourself wrote the test program, you also don't have to fear it setting off any alarms :-) You can compile directly on myth using a copy of a Makefile from any CS107 assignment/lab as a starting point, and then use gdb or objdump to poke around.

Before attempting to breach the master vault, you should use the above tools, and gdb tricks below, to figure out how to reliably prevent alarms from triggering. There are simple manual blocks that give some measure of protection, but it is best to go further to develop an invincible guard. Feel free to use any technique at your disposal, such as leveraging gdb features, tweaking the global program state, modifying your setup, tricking the SecureVault program into running in a safe manner, or hacking the vault executable. Avoiding the alarm entirely is one straightforward approach to ensure that we won't hear about it, but there are ways to selectively disable just the transmission portion to the central system (course staff). Once you figure how to set up appropriate protection against alarms, you will then be free to experiment with the levels without worry. Note that the program can only trigger an alarm when it is "live", i.e., executing in shell or running with gdb. Using tools such as nm, strings, and objdump to examine the executable cannot trigger the alarm.

Using `gdb`

The debugger is absolutely invaluable on this assignment. Here are some suggestions on how to maximize your use of gdb. You'll also get more practice with gdb tricks in the assembly labs.

Expand your gdb repertoire. The labs have introduced you to handy commands such as break, x, print, info, disassemble, display, watch, and stepi/nexti. Here are some additional commands that you might find similarly useful: p $eflags (prints condition codes), jump, kill, and return. Within gdb, you can use help name-of-command to get more details about any gdb command. See the quick gdb reference card for a summary of many other neat gdb features.
Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. Use commands to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on. Breakpoint commands are particularly useful for installing actions you intend to be automatically and infallibly completed when arriving at a certain place in the code. (hint!)

gdb kill workaround: gdb 9.2 (current version on myth as of 04/2021) has a bug when attempting to use kill in the commands sequence for a breakpoint that creates a cascade of problems --can cause gdb itself to crash or hang. The gdb command signal SIGKILL can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.
Use a .gdbinit file. The provided file named .gdbinit in the assignment folder can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. Upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the SecureVault program? The .gdbinit file we give you in the starter repo has only one command to echo Successfully executing commands from .gdbinit in current directory. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded. If you see an error message about auto-loading .gdbinit being declined when starting gdb, this means you haven't installed the CS107 GDB configuration file - see the top of this page for instructions.
Custom gdb commands. Use define to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit file so you have access to them in subsequent gdb sessions as well.
Fire up tui mode (maybe...). The command layout asm followed by layout reg will give you a split window showing disassembly and register values. This layout will display current values for all registers in the upper pane, the sequence of assembly instructions in the middle pane, and your gdb command line at the bottom. As you single-step with si, the register values will update automatically (those values that changed are highlighted) and the middle pane will follow instruction control flow. This is a super-convenient view of what is happening at the machine level, but sadly, you have to endure a number of quirks and bugs to use it. The tui mode can occasionally crash gdb itself, killing off gdb and possibly the SecureVault program while it's at it. Even when tui is seemingly working, the display has a habit of turning wonky, often fixable by the refresh command (use this early and often!) but not always. A garbled display could cause you to misunderstand the program state, misidentify where your SecureVault is currently executing, or accidentally execute a gdb command you didn't intend. Any alarm suppression mechanism that requires you, the fallible human, to take the right action at a critical time could easily be waylaid by interference, so don't attempt tui before you have invincible automatic protection against alarms. Selective use of auto-display expressions (introduced in lab6) is a great alternative with less disruption. You can exit tui using ctrl-x a and re-enter it again (this doesn't require leaving gdb and losing all your state).

SecureVault Deliverables

You should add the passwords to defuse each level in your input.txt file. We will test by running ./vault input.txt on your submission. The input.txt file in your submission should contain one line for each level you have solved, starting from level 1. Malformed entries in your input.txt or wrong line-endings (see FAQ below) will cause grading failures. To avoid surprises, be sure that you have verified your input.txt in the same way we will in grading (i.e., ./vault input.txt). We also have a few follow-up questions that you should answer in your readme.txt file:

What tactics did you use to suppress/avoid/disable alarms?
level_1 contains an instruction near the start of the form mov $<multi-digit-hex-value>,%edi. Explain how this instruction fits into the operation of level_1. What is this hex value and for what purpose is it being moved? Why can this instruction reference %edi instead of the full %rdi register?
level_2 contains a jg that is not immediately preceded by a cmp or test instruction. Explain how a branch instruction operates when not immediately preceded by a cmp or test. Under what conditions is this particular jg branch taken?
Explain how the loop in the winky function of level_3 is exited.
Explain how the mycmp function is used in level_4. What type of data is being compared and what ordering does it apply?
(NOTE: Before answering this question, you should watch the 2 short posted videos on Canvas in the Assignment 5 folder covering discussions of trust and privacy.) How would you describe Stanford Bank’s trust model? (In other words: who among the bank headquarters, the bank branches, and you was trusted?)

Sanity Check

The default sanitycheck test cases are ATM inputs and one test case that reports the line count of your input.txt file. This sanitycheck is configured to only allow test cases for ATM in your custom_tests file. The SecureVault executable is not run by sanitycheck.

NOTE: when running your own custom tests, make sure to inspect the output to ensure your tests are causing the behavior you expect! The sanitycheck tool itself does not verify that the tests cause the specified exploits.

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished version is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.

We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!

Grading

For this assignment, here is a tentative point breakdown (out of 102):

custom_tests (15 points) Each successful attack test case earns 5 points. We will test by running tools/sanitycheck custom_tests on your submission. Your custom_tests should contain 3 test cases, one for each ATM attack.
readme.txt (55 points) The written questions will be graded on the understanding of the issues demonstrated by your answers and the thoroughness and correctness of your conclusions.
Input.txt (32 points) Each SecureVault level you have solved earns 8 points. We will test by running ./vault input.txt on your submission. The input.txt file in your submission should contain one line for each level you have solved, starting from level 1. Malformed entries in your input.txt or wrong line-endings (see FAQ below) will cause grading failures. To avoid surprises, be sure that you have verified your input.txt in the same way we will in grading (i.e., ./vault input.txt).
SecureVault alarms triggered (up to 6 points deducted) Each alarm notification that reaches the staff results in a 1 point deduction, capped at 6 points total.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, your assembly skills will be unstoppable, and you will have a better understanding of trust, privacy and security! You successfully found vulnerabilities in a program using its source and assembly, and reverse engineered a complex program without having access to its source at all. Rock on!

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task.

What are some of the gdb commands that allow re-routing control in an executing program?
What is the main indication that an assembly passage contains a loop?
What makes someone a trustworthy fiduciary or guardian of personal data? How and why should an institution like a bank protect the privacy of its customers?
Explain the difference between a function's return value and its return address.
Consider the mechanics of function pointer work at the assembly level. How is a call through a function pointer the same/different when compared to an ordinary function call?
For performance reasons, the compiler prefers storing local variables in registers whenever possible. What are some reasons that force the compiler to store a local variable on the stack instead?
For the instruction sequence below, what must be true about values of op1 and op2 for the branch to be taken? What changes if ja is substituted for jg?
```
cmp op1,op2
jg target
```

Frequently Asked Questions

My input passes the level when typed manually, but when I added the same input to `input.txt`, it triggers an alarm. What gives?

When testing on input.txt, we advise you do so with your alarm defense in place against possible editing glitches. The contents of input.txt should consist of the input for each level on its own line and each line should end with a standard Unix newline. Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Look carefully for extraneous leading/trailing spaces or mismatched line endings. Emacs uses the correct line endings (\n) by default. Editors on other platforms that are using the line-ending conventions for Mac (\r) or Windows (\r\n) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using Emacs on myth.

What do function pointers look like in assembly?

You can have a jmp or call instruction with an operand that is either the label you would like to jump to, which is hardcoded in the instruction itself, or the location of where the address is we should jump to. For instance, if the jump location is stored in %rax, then we could say jmp *%rax or call *%rax to jump to or call the function at the address stored in %rax. This is useful, because it turns out a function pointer is really just storing the address of where a function's first instruction lives. In other words, if you print out the value of a function pointer, it should be the same as what %rip stores when it is starting to execute that function. See this Compiler Explorer playground for an example using the print_array program from lecture 9. In particular, check out how main's call to print_array is translated to assembly, and what the call to print_fn inside of print_array looks like in assembly.

How do I print register or condition code values in gdb?

The gdb command info reg will show the current value for all registers. To print out the condition code values, use p $eflags. You can also access individual register values for use in gdb commands such as print, examine, or display. The register names are prefixed by dollar sign in gdb. A register value is treated as void*; you can apply a typecast to change the interpretation. Some examples:

(gdb) p/t $rax           # print %rax, binary
(gdb) p (char *)$rax     # print %rax, interpret as char*
(gdb) x/2wd $rax         # examine memory (deref %rax), show 2 ints
(gdb) display/2gx $rsp   # auto-print 2 quadwords from stack top in hex

To use a register value in a larger expression, be sure to use C syntax, not assembly. For example, if you need to dereference a register, apply *, not wrap in parentheses. If you ask gdb to evaluate an expression in assembly syntax, it handles it fairly oddly:

(gdb) p ($rax)               # parens ignored, ($rax) same as $rax
(gdb) p 0x8($rsp)            # gdb will segfault on this

Instead use C syntax, including typecast where necessary.

(gdb) p *(long *)$rax
(gdb) p *(long *)((char *)$rsp + 8)

The disassembly shows %eax being set to 0 before certain function calls. What's with that?

Variable argument functions (e.g printf and scanf variants) require a little extra setup relative to normal calls. The x86-64 calling conventions for variable argument functions must indicate the presence of any float/double arguments by setting %rax to the count of vector registers used. If none are used (i.e., no parameters of float/double type), it sets %rax to zero.