Written by Julie Zelenski
As you vanquish each CS107 assignment and lab, a little celebration is definitely in order, but once the excitement dies down, take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.
Assign0: Unix
You now have your environment configured and should be starting to feel comfortable with the command-line interface, can navigate the filesystem, edit files, and generally get around in unix. Off to a great start!
- Identify a few different techniques to avoid painstakingly re-typing a long unix command to execute.
- How do you copy and paste in your chosen editor?
- Explain the purpose and use of the CS107 tools
sanitycheck
andsubmit
. How do you customize the tests used by sanity check?
Lab1: Bits and integers
The takeaways from lab1 should be proficiency with bitwise operations, constructing and using bitmasks, and a solid grasp on the representation of unsigned values as a binary polynomial and signed values in two's complement.
- Consider rounding the magnitude of an integer up to power of two (e.g. 3 rounds to 4, 4 to 4, 5 to 8, for negative: -3 rounds to -4, -4 to -4, and so on). How does the bit pattern of a positive int differ from the bit pattern of the value after rounding to power of two? What about for a negative int?
- Give a bitwise expression to zero the upper N bits of an unsigned int V. What does this expression compute numerically?
- When do you supply the command-line arguments to a program you wish to run under gdb: when starting gdb or from within gdb when running the program?
- Chapter 2 of B&O has many excellent practice problems presented with solutions - check them out!
Assign1: Bits and integers
Congrats on your first complete C programs! At the end of the quarter, many students recall this first assignment as one of the tougher ones not so much due to code/algorithm complexity, but because of the effort needed to become facile using new tools in an unfamiliar environment and acclimating to through the sparse, unforgiving world of C. Celebrate this important milestone and remember the investment you make now in gaining proficiency will pay off throughout the quarter.
- Consider all negative int’s whose absolute value is an exact power of two (-8, -32 and so on). What do their bit patterns have in common?
- How can you determine if an integer bit pattern has a pair of consecutive 'on' bits? (Straightforward to do using a loop, but there is also a clever single expression...)
- A treasure trove of bit wizardry curated by Sean Anderson, Stanford CS Phd
Lab2: C-strings
You should be continuing to build up your gdb and Valgrind repertoire and learning your way around the string.h functions and manipulating C-strings as raw arrays/pointers.
- In which situations does a call to
strncpy
null-terminate the destination string and in which does it not? What are some of the consequences that result from using a string that is not properly null-terminated? - A program appears to run just fine, but when run under Valgrind, it reports accessing
Address 0x420d50 is 0 bytes after a block of size 24 alloc'd
. Your friend dismisses the Valgrind report as off-base and paranoid. He argues that the program runs ok as proof, and what's the big deal about 0 bytes anyway? Explain to him why accessing any memory outside the allocated block (whether 0, 1, 2, or 1000 bytes after) is a problem and further describe how a memory bug can lurk asymptomatically, despite being in error. If Valgrind reports an issue, best to pay attention! - Give a C-expression that reports whether a given
char*
has a matching prefix/suffix N chars long. Be sure to take advantage of the string.h library functions. What would happen if your expression were evaluated with a value for N that was longer the string's length? - The
atoi/strtol
functions convert a string of digits to a decimal, but no standard function that converts in reverse. Describe a technique you could use to convert a number to a string equivalent.
Assign2: C-strings
You just wrote your own implementation of a standard Unix utility program and improved version of a standard library function. That's a pretty darn impressive accomplishment, especially so given only two weeks of exposure to C and Unix -- wow!
- The string library contains several functions to perform a form of string comparison, e.g.
strncmp
,strstr
,strchr
,strspn
, ... Explain the differences between the functions and identify a situation in which each is appropriate. - Write a C expression that converts a hexadecimal digit char to its numerical value, i.e. '1' => 1, 'f' => 15. Be sure to consider
string.h
functions that can help with the job. - The first parameter to the function
scan_token
is of typeconst char **
. Explain the purpose of the extra level of indirection on that argument. - It is controversial whether to add
.
(the current directory) to your PATH. Why might it be convenient? Why does it introduce a security risk?
Lab3: Pointers and arrays
You should be continuing to build up your gdb and Valgrind repertoire and becoming skilled at reading and writing code with heavy use of with pointers. Arrays and pointers are ubiquitous in C and a good understanding of them is essential.
- If
ptr
is declared as anchar *
, what is the effect ofptr++
. What ifptr
is declared as aint *
? - Although
&
applied to the name of a stack-allocated array (e.g.&buffer
) is a legal expression and has a defined meaning, it isn't really sensible. Explain why such use is a sure sign of an error/misunderstanding on the part of the programmer. - The argument to malloc is
size_t
(unsigned). Consider the erroneous callmalloc(-1)
, which seems to make no sense at all. The call compiles with nary a complaint -- why is it "ok"? What happens when it executes? - What is the purpose of the
realloc
function? What happens if you attempt to realloc a non-malloced pointer, such as string constant?
Assign3: Pointers and arrays
- The
scan_token
function you wrote for assignment 2 required that the client supply the memory to write the scanned token. Theread_line
function of assignment 3 instead allocates memory on the client's behalf. What are the advantages and disadvantages between the two approaches? - It is stated that a correct program should have a one-to-one correspondence between
malloc
calls andfree
calls. How does adding arealloc
call change the needed number ofmalloc
andfree
calls? Briefly explain your reasoning. - Which is the bigger problem: memory errors or memory leaks? Why? How can you tell which is which in a Valgrind report?
- How can you gauge whether your program has acceptable performance in terms of runtime or memory efficiency?
- When it is appropriate to use
calloc
instead ofmalloc
?
Lab4: Void*
You are getting your bearings in the world of raw memory. You can make sense of the cryptic syntax uses for function pointers. You know how to use the type system to your advantage wherever you can, but also how to work without it where you must. You know the care required to make a proper call to memcpy/memmove, exactly where and why you need a typecast, and have increased vigilant about using the correct level of indirection.
- Why must you typecast a
void*
in a pointer-arithmetic expression? - What is the behavior of
memcpy
when the source and destination overlap? How doesmemmove
properly account for overlapping regions? - An asymmetric comparison function is one that casts its two
void*
arguments to different pointee types. Why is passing an asymmetric comparison function tobsearch
almost certainly an indication of programmer error?
Assign4: Void*
Your command of memory management is now unstoppable and you have a rock-solid understanding of the client use of a void*
interface and the generic implementation. The critical mastery is correctly passing/receiving the void*
s flying around: what level of indirection to use, what typecasts are needed, using *
and &
in exactly and only the necessary places. You should know without question how to determine the necessary level of indirection and correctly apply it. It's also valuable to understand and be able to reason through the consequences of getting it wrong.
- To read the input, the
mysort
program reads each line into a maximum-sized piece of stack memory used temporarily and the copies to right-sized dynamic memory for persistent storage. This stack-heap dance is a common technique in situations like this. Explain why this approach is necessary/appropriate given the characteristics of the stack and heap. - The
strcmp
function almost matches the prototype for the comparison function argument ofbsearch
andqsort
. A sketchy typecast (such as the one exhibited byscandir
) could be used to hush the compiler's complaint about the mismatch, but this would be a huge mistake.strcmp
is pretty much never the right function to use as a callback comparison function. Explain why not. - Read the man page for
lfind
and describe how to uselfind
with an appropriate callback to re-implement the functionality ofstrlen
.
Lab5: Floats
You have been introduced to floating point and understand the representation as a form of "binary scientific notation". You should have a feel for the inherent compromises in this system: which values are representable, which are not, the gaps in the floating point number line, and a general sense of the rounding/approximation to expect from a floating point calculation.
- Explain the reasons why a floating point constant may have to be rounded on assignment to a float variable. Are any integer constants rounded on assignment to a float? Which?
- Roughly how large is the gap between 1.0 and the next larger float value? between 1e6 (1 million) and the next larger float value? between FLT_MAX and the next smaller float value?
- If the float bits were re-apportioned to designate 7 bits for the exponent and 24 for the significand, how would this change what float values could be represented (e.g. dynamic range, precision, size of gaps)?
Assign5: Floats, assembly
At this point, you should be feeling less bewildered by reading assembly and building up your skills with the tools to disassemble, trace, and examine programs at an assembly level. You are ready for the famous binary bomb!
- Is floating point addition guaranteed to be commutative? associative? Explain why or why not.
- If your bank stored your balance as a 32-bit float, roughly how large would your balance need to be such that trying to add a single penny would not register?
- How can you spot read/write of a global variable in a sequence of assembly instructions?
- How does the caller pass arguments to a function?
Lab6: Assembly
You now have tools for disassembling object files and have been introduced to the debugger commands used at the assembly level. The hands-on practice relating C code to its compiled assembly and reverse-engineering from assembly to C is great preparation for your next assignment.
- Give an example of a C sequence/expression that could be compiled into a variety of different, but equivalent, assembly instruction(s).
- What is the difference between
sar
andshr
? - How is the return value of a function communicated from the callee to the caller?
- What kind of clues might suggest that an
lea
instruction is doing an address calculation versus when it is merely arithmetic? - How can you get the value of a variable during execution that gdb reports has been
<optimized out>
?
Assign6: Assembly
After reverse-engineering the bomb, the assembly skills of CS107 students are unstoppable!
- What are some of the gdb commands that allow re-routing control in an executing program?
- What is the main indication that an assembly passage contains a loop?
- Explain the difference between a function's return value and its return address.
- Consider the mechanics of function pointer work at the assembly level. How is a call through a function pointer the same/different when compared to an ordinary function call?
- For performance reasons, the compiler prefers storing local variables in registers whenever possible. What are some reasons that force the compiler to store a local variable on the stack instead?
- For the instruction sequence below, what must be true about values of
op1
andop2
for the branch to be taken? What changes ifja
is substituted forjg
?cmp op1,op2 jg target
Lab7: Stack
This is one of my favorites among the labs -- lots of neat exercises to explore! Having learned the gdb commands to dig around in the stack frames and being able to relate the contents of the stack to the runtime program state is valuable insight when debugging.
- Why are the registers used for function parameters completely fixed, yet no conventions apply to where locals are stored?
- Explain the rules for caller-owned versus callee-owned register use. What is the advantage of this arrangement relative to a simpler scheme such treating all registers as caller-owned or callee-owned?
- Explain how insertion/deletion of seemingly unrelated printf statement could make a difference in a program's behavior.
- What are the signs in gdb that help you diagnose a stack overrun has trashed the resume address?
Lab8: Optimization
Learning how to use Valgrind callgrind is another key tool to add to your developer arsenal. Your killer reversing skills are just what is needed for examining the generated assembly and identifying gcc's cool transformations to make code run faster.
- How can you use the callgrind profiler to measure instruction counts?
- What is meant by a "hot spot"? How do you identify a hot spot in the annotated source?
- Loop unrolling is the process of reworking a loop to accomplish more per iteration and iterate fewer times. In what situations might this provide a performance win?
- Avoiding expense of function call/return is often cited for the motivation for inlining a function. What are the estimated savings in terms of number of instructions? Inlining also enables additional optimization opportunities, given that gcc applies most transformations within a function, but not inter-procedurally. Do you expect these gains to be more or less significant than the call overhead? Explain.
Assign7: Heap
A former student likened heap allocator to a "victory lap" -- a chance to put your hard-won expertise to work and enjoy the triumph of seeing what you are now capable of! Success requires heroic skills with void* pointers, raw memory manipulation, assembly, optimization of both code/memory, and deep proficiency with tools. You have much to be proud of!
- What are the main challenges in making malloc fast? What makes for a fast free?
- Explain the difference between internal and external fragmentation. Which is the greater threat to utilization?
- True or False: If you have built a super-fast malloc and free, implementing realloc in terms of those operations will also be super-fast.
- Throughput and utilization are often viewed in opposition. How might improved throughput come at the expense of lowered utilization (or vice versa). Are there choices that lead to wins in both?