Written by Julie Zelenski
Advice from the writeup that bears repeating: Do not start by running the bomb to "see what it will do..." You will quickly learn that "what it does" is explode :-) When started, it immediately goes into waiting for input and when you enter the wrong response, it will explode and deduct points. Your first task should be to put your kid gloves on and carefully poke around. Once you figure how to set up appropriate protection against explosions, you will then be free to experiment with the levels without any nail-biting anxiety about setting off the bomb.
Here are some possible points of attack for your great reverse-engineering adventure!
nm utility dumps the symbol table from an executable. The symbols includes the names of functions and global variables and their addresses. The symbol table by itself is not a lot to go on, but just reading the names might give you a little sense of the lay of the land.strings utility will display all the printable strings in an executable, including all string constants. What strings do you find in your bomb? Do any of them seem of relevance to the task at hand?objdump disassembler can dump of the object code into its disassembled equivalent. Reading and tracing the disassembled code is where the bulk of your information will come from. Scrutinizing the lifeless object code without executing is a technique known as deadlisting. Once you sort out what the object code does, you can, in effect, translate it back to C and then see what input is expected. This works reasonably well on simple passages of code, but can become unwieldy when the code is more complex.The gdb debugger is absolutely invaluable here. You can use gdb to single-step by assembly instruction, examine (and change!) memory and registers, view the runtime stack, disassemble the object code, set breakpoints and watchpoints, re-route control flow, write your own custom commands, and more. Live experimentation on the executing bomb is the most direct way to become familiar in what's happening at the assembly level. Here are some suggestions on how to maximize your use of gdb on the bomb:
break, x, print, display, info, disassemble, and stepi/nexti. Here are some additional commands that you might find similarly useful: set variable, watch, jump, kill, and return. Within gdb, you can use help name-of-command to get more details about any gdb command.Get fancy with your breakpoints. You can breakpoints by function name, source line, or address of a specific instruction. You can make breakpoints conditional using cond. The ignore command will pass over a breakpoint until a later iteration. Use commands to specify a list of commands to be automatically executed whenever a given breakpoint is hit. These commands might print a variable, dump the stack, jump to a different instruction, change values in memory, return early from a function, and so on. Breakpoint commands are particularly useful for installing actions you intend to be automatically and infallibly completed when arriving at a certain place in the code. (hint!)
Late-breaking news: gdb 7.7 (current version on myth as of 11/2014) has a bug when attempting to use kill in the commands sequence for a breakpoint that creates a cascade of problems -- can even cause gdb itself to crash or hang. The gdb command signal SIGKILL can be used as an alternate means to kill a program from a commands sequence that doesn't trip this bug.
Using a .gdbinit file. The file named .gdbinit in the current directory can be used to set a startup sequence for gdb. In this text file, you enter a sequence of commands exactly as you would type them to the gdb command prompt. If your personal gdb configuration allows loading an init file, upon starting, gdb will automatically execute the commands from it. This will be a convenient place to put gdb commands to execute every time you start the debugger. Hint: wouldn't this be useful for creating breakpoints with commands that you want to be sure are always in place when running the bomb?
Important note: To enable use of .gdbinit, you must create/edit your personal gdb configuration to allow loading it. See discussion of auto-loading in our gdb FAQ. If your auto-loading is declined, copy the command below and execute it in your shell to update your configuration file. You will need to make this configuration change only once.
bash -c 'echo set auto-load safe-path / >> ~/.gdbinit'
The .gdbinit file we give you in the starter repo has only one command to echo Successfully executing commands from .gdbinit in current directory. If you see this message when you start gdb, it confirms the .gdbinit file has been loaded.
Custom gdb commands. Use define to add your own gdb "macros" for often-repeated command sequences. You can add defines to your .gdbinit file so you have access to them in subsequent gdb sessions as well.
layout asm followed by layout reg will give you a split window showing disassembly and register values. This layout will display current values for all registers in the upper pane, the sequence of assembly instructions in the middle pane, and your gdb command line at the bottom. As you single-step with si, the register values will update automatically (those values that changed are highlighted) and middle pane will follow instruction control flow. This is a super-convenient view of what is happening at the machine level, but sadly, you have to endure a number of quirks and bugs to use it. The tui mode can occasionally crash gdb itself, killing off gdb and possibly the bomb while it's at it. Even when tui is seemingly working, the display has a habit of turning wonky, often fixable by refresh but not always. A garbled display could cause you to misunderstand the program state, misidentify where your bomb is currently executing, or accidentally execute a gdb command you didn't intend. Any explosion suppression mechanism that requires you, the fallible human, to take the right action at a critical time could easily be waylaid by interference, so don't attempt tui before you have invincible automatic protection against explosions.The gcc compiler. If you're unsure how to a particular C construct translates to assembly or how to access a certain kind of data, another technique is to try starting from the other side. Write a little C program with the code in question, compile it, and then trace its disassembly, either deadlisted or in gdb. For example, if you're not sure how a break statement works or how a function pointer is invoked by qsort, this would be a good way to find out. Since you yourself wrote the test program, you also don't have to fear its explosive nature :-) You can compile by directly invoking gcc or adapting a simple makefile (the Makefile from any CS107 assignment/lab is a good starting point).
When an unadulterated bomb explodes, it prints "KABOOM", notifies the authorities, and terminates. The bomb can only explode when it is "live", i.e., executing in shell or running with gdb. Using tools such as nm, strings, and objdump to examine the executable will not explode the bomb.
The bomb has no secrets -- all the code is right there. If you dig into the code that processes explosions you can determine for yourself how/when/whether the word gets out. Avoiding the entire explosion is one straightforward approach to assure that we won't hear about it, but there are ways to selectively disable just the transmission portion.
For suppressing explosions anything goes! There are simple manual blocks that give some measure of protection, but it is best to go further to develop an invincible guard. Whether you leverage gdb features, tweak the global program state, modify your setup, trick the bomb into running in a safe manner, or hack the bomb executable, we're good with any technique that keeps the explosions quiet.
We count all explosions that reach us. Consider it a fun challenge to develop a protection so secure and/or to tread so carefully that you never detonate the bomb, but should an explosion slip through, be assured that no cute baby animals have lost their lives and that uncaught explosion is a loss of a mere single point.
We want C code, not pseudocode/description. We use "approximation" to allow that there is not a one-to-one mapping from assembly to original C source.
For those functions we ask you to reverse, you must work your way through the entirety of the code in that specific function in order to provide the complete C translation. For all other code, your goal is to work out a correct input to pass the level. This will require a fairly complete exploration of the code path you follow to defuse, but any code outside that path can be investigated on a need-to-know basis.
Some assembly cannot be definitively matched to an unique C sequence (for example, for and while loops are largely indistinguishable, and a cascading if/else can look a lot like a switch) and likewise, the same C sequence can compile to different assembly. It's fine to provide one correct interpretation and indicate your confidence in how exact the match or acknowledge where there is room for ambiguity/alternatives.
When testing on input.txt, we advise you do so with your explosion defense in place against possible editing glitches. The contents of input.txt should consist of the input for each level on its own line and each line should end with a standard Unix newline (\n). Stop in gdb and examine the line read from your file to spot the discrepancy between what you need and what you have. Look carefully for extraneous leading/trailing spaces or mismatched line endings. The unix editors available on myth (emacs, vim, gedit, etc.) use the correct line endings (\n) by default. Editors on other platforms that are using the line-ending conventions for Mac (\r) or Windows (\r\n) will cause you grief. The easiest approach to avoid problems is to edit the input.txt file using a unix editor on a unix system.
The gnu tool chain defaults to the att (AT&T) syntax and all of our materials (text, lecture, lab) are consistent with this syntax. If you hunt down other resources in the wild, you may encounter Intel syntax where the order of operands are reversed, register names are not prefixed with %, immediate values are not prefixed with $, indirection is expressed with brackets instead of parentheses, and so on. For example, the att instruction push %rbp is written as push RBP in Intel and att movl $1, (%rsp) becomes movl [RSP], 1. Translating between them can be confusing, so it's recommended that you stick to resources that use the same syntax as our tools/text.
si through an explosion because I was confused about the next instruction to be executed. How can I make tui behave?Grr, I have a love-hate relationship with tui myself. Whoever is responsible for it was obviously not a CS107 alum who has learned to thoroughly test their code, no? Remedies to try in order of increasing desperation:
refresh early and oftenctrl-x a and re-enter (this doesn't require leaving gdb and losing all your state).gdbinit to re-set the state for you)Keep a list on what actions seem to trigger problems for you and avoid doing those things (for example, on my Mac, resizing my terminal window while in active tui mode creates unresolvable havoc, so I don't do that). The split reg/asm window is such a great way to follow along while single-stepping it's worth a little pain to baby tui along. However, if your anti-explosion strategy relies on you choosing the appropriate next action, you are susceptible to being misled by tui at a critical time, so don't even attempt tui until you have a rock-solid automated defense.
The gdb command info reg will show the current value for all registers. You can also access individual register values for use in gdb commands such as print, examine, or display. The register names are prefixed by dollar sign in gdb. A register value is treated as void*; you can apply a typecast to change the interpretation. Some examples:
(gdb) p/t $rax # value in %rax printed in binary
(gdb) p (char *)$rax # value in %rax interpreted as char*, print string
(gdb) x/2wd $rax # deref %rax, examine memory, print 2 decimal ints from that location
(gdb) display/2gx $rsp # auto-print top 2 quadwords on stack in hex
To use the register value in a larger expression, write the expression in C syntax, not assembly. (see last command above for an example) If you accidentally use assembly syntax, gdb handles it fairly oddly:
(gdb) p ($rax) # parens do not dereference in C, this is same as p $rax without parens
(gdb) p 0x8($rsp) # gdb will segfault on this
Variable argument functions (e.g printf and scanf variants) require a little extra setup relative to normal calls. The x86-64 calling conventions for variable argument functions must indicate presence of any float/double arguments by setting %rax to the count of vector registers used. If none are used (i.e. no parameters of float/double type), it sets %rax to zero.
Sorry, this was our bad. We had to disable a batch of bombs due to a buggy level_3. Look for our email to your sunet@stanford.edu address with instructions on how to fix. The instructions are also in this piazza post.