CS107 Assignment 5 advice

Written by Julie Zelenski

Advice from the writeup that bears repeating: Do not start by running the bomb to "see what it will do..." You will quickly learn that "what it does" is explode :-) When started, it immediately goes into waiting for input and when you enter the wrong response, it will explode and deduct points. Your first task should be to put your kid gloves on and carefully poke around. Once you figure how to set up appropriate protection against explosions, you will then be free to experiment with the levels without any nail-biting anxiety about setting off the bomb.

Suggested tools and strategies

Here are some possible points of attack for your great reverse-engineering adventure!

The nm utility dumps the symbol table from an executable. The symbols includes the names of functions and global variables and their addresses. The symbol table by itself is not a lot to go on, but just reading the names might give you a little sense of the lay of the land.
The strings utility will display all the printable strings in an executable, including all string constants. What strings do you find in your bomb? Do any of them seem of relevance to the task at hand?
The objdump disassembler can dump of the object code into its disassembled equivalent. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex.
The gdb debugger is absolutely invaluable here. You can use gdb to single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints and watchpoints, re-route control flow, write your own custom commands, and more. Live experimentation on the executing bomb is the most direct way to become familiar in what's happening at the assembly level. Here are some suggestions on how to maximize your use of gdb on the bomb:
- Expand your gdb repertoire. The labs have introduced you to handy commands such as break, x, print, display, info, disassemble, and stepi/nexti. Here are some additional commands that you might find similarly useful: set variable, watch, jump, kill, and return. Within gdb, you can use help name-of-command to get more details about any gdb command.
- Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. You can make breakpoints conditional using cond. The ignore command will pass over a breakpoint until a later iteration. Use commands to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on. Breakpoint commands are particularly useful for installing actions you intend to be automatically and infallibly completed when arriving at a certain place in the code. (hint!)
  
  Late-breaking news: gdb 7.7 (current version on myth as of 11/2014) has a bug when attempting to use kill in the commands sequence for a breakpoint that creates a cascade of problems -- can even cause gdb itself to crash or hang. The gdb command signal SIGKILL can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.
- Using a .gdbinit file. The file named .gdbinit in the current directory can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. If your personal gdb configuration allows loading an init file, upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the bomb?
  
  Important note: To enable use of .gdbinit, you must create/edit your personal gdb configuration to allow loading it. See discussion of auto-loading in our gdb FAQ. If your auto-loading is declined, copy the command below and execute it in your shell to update your configuration file. You will need to make this configuration change only once.
```
bash -c 'echo set auto-load safe-path / >> ~/.gdbinit'
```
  The .gdbinit file we give you in the starter repo has only one command to echo Successfully executing commands from .gdbinit in current directory. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded.
- Custom gdb commands. Use define to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit file so you have access to them in subsequent gdb sessions as well.
- Fire up tui mode (maybe...). The command layout asm followed by layout reg will give you a split window showing disassembly and register values. This layout will display current values for all registers in the upper pane, the sequence of assembly instructions in the middle pane, and your gdb command line at the bottom. As you single-step with si, the register values will update automatically (those values that changed are highlighted) and middle pane will follow instruction control flow. This is a super-convenient view of what is happening at the machine level, but sadly, you have to endure a number of quirks and bugs to use it. The tui mode can occasionally crash gdb itself, killing off gdb and possibly the bomb while it's at it. Even when tui is seemingly working, the display has a habit of turning wonky, often fixable by refresh but not always. A garbled display could cause you to misunderstand the program state, misidentify where your bomb is currently executing, or accidentally execute a gdb command you didn't intend. Any explosion suppression mechanism that requires you, the fallible human, to take the right action at a critical time could easily be waylaid by interference, so don't attempt tui before you have invincible automatic protection against explosions.
- The online gdb manual is a great resource where you can read up about gdb features and see specific examples. https://sourceware.org/gdb/current/onlinedocs/gdb/.
The gcc compiler. If you're unsure how to a particular C construct translates to assembly or how to access a certain kind of data, another technique is to try starting from the other side. Write a little C program with the code in question, compile it, and then trace its disassembly, either deadlisted or in gdb. For example, if you're not sure how a break statement works or how a function pointer is invoked by qsort, this would be a good way to find out. Since you yourself wrote the test program, you also don't have to fear its explosive nature :-) You can compile by directly invoking gcc or adapting a simple makefile (the Makefile from any CS107 assignment/lab is a good starting point).
The GCC Explorer interactive compiler website. Less heavyweight than firing up gcc yourself is the handy gcc-in-a-web-site we previewed in lab5. Type in a code snippet and get its immediate translation to assembly -- easy-peasy! Here is a link https://godbolt.org/g/fHoZ7S to use the myth's version of GCC (4.8.x) and the compiler flags from the CS107 makefiles. The tool is doing the same translation you could do via gcc, but in a convenient way that encourages interactive exploration.

Frequently asked questions about assign5

How do I know if the bomb has exploded?

When an unadulterated bomb explodes, it prints "KABOOM", notifies the authorities, and terminates. The bomb can only explode when it is "live", i.e., executing in shell or running with gdb. Using tools such as nm, strings, and objdump to examine the executable will not explode the bomb.

How can I tell if the staff heard my explosion?

The bomb has no secrets -- all the code is right there. If you dig into the code that processes explosions you can determine for yourself how/when/whether the word gets out. Avoiding the entire explosion is one straightforward approach to assure that we won't hear about it, but there are ways to selectively disable just the transmission portion.

I have an idea about stopping explosions by <insert-cool-idea-here>. Is this allowed?

For suppressing explosions anything goes! There are simple manual blocks that give some measure of protection, but it is best to go further to develop an invincible guard. Whether you leverage gdb features, tweak the global program state, modify your setup, trick the bomb into running in a safe manner, or hack the bomb executable, we're good with any technique that keeps the explosions quiet.

I exploded my bomb, but I assure you it was an accident/misunderstanding/not my fault! I hadn't even read the assignment writeup or advice page before I started. Can I undo that explosion?

We count all explosions that reach us. Consider it a fun challenge to develop a protection so secure and/or to tread so carefully that you never detonate the bomb, but should an explosion slip through, be assured that no cute baby animals have lost their lives and that uncaught explosion is a loss of a mere single point.

One of the readme questions asks us to reverse a function and provide an approximation of the C source. Does this mean actual C code or just a pseudocode description?

We want C code, not pseudocode/description. We use "approximation" to allow that there is not a one-to-one mapping from assembly to original C source.

Do we need to reverse every single line of C source within the level to solve it?

For those functions we ask you to reverse, you must work your way through the entirety of the code in that specific function in order to provide the complete C translation. For all other code, your goal is to work out a correct input to pass the level. This will require a fairly complete exploration of the code path you follow to defuse, but any code outside that path can be investigated on a need-to-know basis.

What if the asm instructions in a function we are asked to reverse don't have a unique mapping to C source?

Some assembly cannot be definitively matched to an unique C sequence (for example, for and while loops are largely indistinguishable, and a cascading if/else can look a lot like a switch) and likewise, the same C sequence can compile to different assembly. It's fine to provide one correct interpretation and indicate your confidence in how exact the match or acknowledge where there is room for ambiguity/alternatives.

My input defuses the level when typed manually, but when I added the same input to input.txt, it explodes. What gives?

When testing on input.txt, we advise you do so with your explosion defense in place against possible editing glitches. The contents of input.txt should consist of the input for each level on its own line and each line should end with a standard Unix newline (\n). Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Look carefully for extraneous leading/trailing spaces or mismatched line endings. The unix editors available on myth (emacs, vim, gedit, etc.) use the correct line endings (\n) by default. Editors on other platforms that are using the line-ending conventions for Mac (\r) or Windows (\r\n) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using a unix editor on a unix system.

I found some other assembly reference material that seems syntactically/logically inconsistent with the assembly from our textbook/lecture/lab/objdump/gdb. What's up?

The gnu tool chain defaults to the att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parentheses, and so on. For example, the att instruction push %rbp is written as push RBP in Intel and att movl $1, (%rsp) becomes movl [RSP], 1. Translating between them can be confusing, so it's recommended that you stick to resources that use the same syntax as our tools/text.

I hate tui! I just managed to `si` through an explosion because I was confused about the next instruction to be executed. How can I make tui behave?

Grr, I have a love-hate relationship with tui myself. Whoever is responsible for it was obviously not a CS107 alum who has learned to thoroughly test their code, no? Remedies to try in order of increasing desperation:

refresh early and often
exit tui using ctrl-x a and re-enter (this doesn't require leaving gdb and losing all your state)
quit gdb and start all over (this can be made less annoying by use of .gdbinit to re-set the state for you)

Keep a list on what actions seem to trigger problems for you and avoid doing those things (for example, on my Mac, resizing my terminal window while in active tui mode creates unresolvable havoc, so I don't do that). The split reg/asm window is such a great way to follow along while single-stepping it's worth a little pain to baby tui along. However, if your anti-explosion strategy relies on you choosing the appropriate next action, you are susceptible to being misled by tui at a critical time, so don't even attempt tui until you have a rock-solid automated defense.

How do I print register values in gdb?

The gdb command info reg will show the current value for all registers. You can also access individual register values for use in gdb commands such as print, examine, or display. The register names are prefixed by dollar sign in gdb. A register value is treated as void*; you can apply a typecast to change the interpretation. Some examples:

(gdb) p/t $rax                      # value in %rax printed in binary
(gdb) p (char *)$rax                # value in %rax interpreted as char*, print string
(gdb) x/2wd $rax                    # deref %rax, examine memory, print 2 decimal ints from that location
(gdb) display/2gx $rsp              # auto-print top 2 quadwords on stack in hex

To use the register value in a larger expression, write the expression in C syntax, not assembly. (see last command above for an example) If you accidentally use assembly syntax, gdb handles it fairly oddly:

(gdb) p ($rax)                      # parens do not dereference in C, this is same as p $rax without parens
(gdb) p 0x8($rsp)                   # gdb will segfault on this

The disassembly shows %eax being set to 0 before certain function calls. What's with that?

Variable argument functions (e.g printf and scanf variants) require a little extra setup relative to normal calls. The x86-64 calling conventions for variable argument functions must indicate presence of any float/double arguments by setting %rax to the count of vector registers used. If none are used (i.e. no parameters of float/double type), it sets %rax to zero.

My bomb says "Seriously, we're on to you." Huh?

This means you've tried to disable your bomb in a way that your bomb does not approve of. While we admire your crafty and curious nature in trying to find creative ways to disable your bomb, we suggest just setting one or more strategically placed breakpoint(s) as a way of preventing explosion notificaions from being sent.

Assignment 5 advice