Lab sessions Mon Apr 18 to Thu Apr 21
Lab written by Julie Zelenski
During this lab you will:
void*
, memcpy/memmove, and function pointersFind an open computer and somebody new to sit with. Introduce yourself and regale each other with war stories of vanquishing your void*
fears and bugs last week.
Get started. Get the starter using the command
hg clone /afs/ir/class/cs107/repos/lab3/shared lab3
This creates the lab3 directory which contains source files and a Makefile.
Pull up the online lab checkoff and have it open in a browser so you'll be able to jot down things as you go. At the end of the lab period, have the TA check off your work.
Writing generic operations. The generic.c
file is missing the implementation of the substitute
and has_duplicates
functions. The first is a find-and-replace operation and the second searches an array for repeated elements. Both operate on generic arrays and use a client callback function. Read the function comments in the file for details and implement the functions to match their specifications. The program contains client code you can use for testing. Writing these is a great warmup for Assignment 3 which focuses on the implementation of a void*
interface!
Data representation. Compile the data
program and run it under gdb. Break in main
and step through initializing and printing the array. Now you're ready to start nosing around. The gdb command for the day is x
, which examines the raw contents of memory at an address. It accepts format modifiers to control the range of bytes to examine and how to show the data. Within gdb, ask for help x
to see the available options. Below are some expressions to evaluate. Try to predict what will be printed, then type into gdb and confirm your understanding.
(gdb) x/1wd &arr[0] examine memory for first elem, format /1wx will read 1 word and print value in decimal format
(gdb) x/1wt arr format /1wt 1 word in binary (note arr is synonym for &arr[0])
(gdb) x/1wx arr /1wx 1 word in hex
(gdb) x/4bx arr /4bx 4 bytes in hex (what is difference between this and previous?)
x
is also handy for dumping a sequence of array elements:
(gdb) x/9wd arr + 2 /9wd 9 words, each decimal (prints arr[2] and upward, past end of array even)
(gdb) x/12bc argv[0] /12bc 12 bytes, each char
(gdb) x/2qa argv /2qa 2 quadwords, each address
You can use x
to re-interpret memory into a specified representation. No conversion is done, raw bits are read and shown through the lens of a different interpretation. This is akin to what happens at runtime when applying a typecast.
(gdb) x/1wf arr interpret int bits as if they were float bits
(gdb) x/1wd argv[0] interpret first chars of string as if int
Answer the following questions by experimenting with x
within gdb:
An important take-away is realizing that bits are just bits, and only with the proper interpretation do bits have meaning. When working within the type system, there are no surprises-- the compiler uses the appropriate interpretation for the data to match its declared type. But when using void*
or introducing typecasts, you're renouncing the compiler's help and you must be vigilant about tracking the types yourself. If using a void*
interface (such as CVector or qsort) and intending to pass char*
but mistakenly passing an int*
or char**
instead, there is no type-checking (neither compile-time or run-time) that will spot this error, the bits are silently mis-interpreted to produce strange and perhaps catastrophic results.
C bitwise operators and use of bitmasks. The bitops.c
program demonstrates use of the bitwise operators (& | ^ ~ << >>
). Start the program under gdb and set a breakpoint on bitwise
and run to the breakpoint. Now try out the display
command, which sets up an auto-print expression to be evaluated each time you single-step. In gdb, use display/t val
to show binary for val. Step from here using the gdb step
command. Before each line is executed, work out for yourself what you expect the new value to be and then step to verify your understanding is correct.
Now let's try your hand at creating masks and using them in bitwise expressions. The bitmasks
function contains comments marked FIXME, which are placeholders for you to fill in:
Play around with constructing bitmasks (either on paper or in gdb) until you feel comfortable with bit patterns and bitwise manipulation. For example, start with an int N, set to a number between 0 and 7, and build an expression around N to create a mask with all 0 bits and 1 on bit at position N? a mask with all 1s except for a zero in position N? Least significant N bits on? Most significant N bits on?
Simple behavior of integer types. Compile and run the ints
program and use its results to answer the following questions. When asked "what happens", consider: is there a compiler warning? compiler error? runtime error? unpredictable result?
Integer bit patterns. The integer-family types are stored as a binary polynomial using two's complement notation. Here are some questions to work through to understand their bit patterns. You can write code to test, experiment in gdb, or try out this nifty online visualizer (select from dropdown menu to view as signed 32-bit integers to match our system).
Once you understand the bitwise operations and how the int bits are laid out, you can put them together to do some neat tricks. Try implementing the function sum_overflows
function in ints.c
that is given two ints and determines if their sum overflows. First, consider what must be true about the signs of the operands/sum if overflow occurred. Add code that use bitwise ops to extract the sign bit from the two operands and their sum and examines those sign bits to determine whether the sum overflowed. (Overflow may seem unthinkable, but it happens! In the news: Boeing 787 bug, the overflow bug in nearly every bsearch and mergesort, and Korean polo pony overflow breaks Youtube).
Testing jam! CS107 has been giving you great practice with building up your testing prowess, but this next assignment will push those skills to the limit. You'll be implementing your version of the venerable CVector and CMap and one of the challenges is getting adequate test coverage into the nooks and crannies of all operations. The assign3 starter contains some simple client test programs and you have your client code from assignment 2 as well. You should be proud of your implementation if it fares well on these tests, but they alone are not enough. Figuring out what's not explored and poking your testing nose into those dark corners is the only sure means to uncover lurking issues, fix them, and submit a finished implementation you are confident can weather anything!
Let's invest the last bit of lab time in a group brainstorming session on a testing plan for CVector. Start by enumerating features from the cvector.h
interface and skim the existing client programs to identify what operations are exercised and where. You're looking to see which operations are called, which variations, and in which combinations/contexts. Are there any operations entirely missing? Are the full range of varied invocations tested for a given operation? While there is only a single way to call cvec_append
, cvec_insert
could be used to insert at beginning, end, or middle, as well as attempt to insert out of bounds, all of which can be tested. What are some of the interacting features (e.g. cleanup function's use in replace) that ought to be specifically targeted? Together create an action list of the missing tests to provide additional coverage and rigor. Be sure to take this list home with you and put it to good use when working on the assignment!
Mixed-up types (optional). This is a previous lab exercise that we shuffled out to make space for the testing jam which we felt was a better use of group time. It's here as material you can use on your own to explore the issues around the co-mingling of signed and unsigned types. The code in the mixed.c
program shows some common pitfalls to check out.
count_chars
. Can you detect where in the code it is making an implicit assumption that chars are signed? If you execute the program, it appears to work correctly, so this assumption seems to hold in our environment. Consider the consequence of porting this code to an environment where chars were unsigned. You can simulate this by changing CFLAGS in the Makefile to include -funsigned-char
and recompiling. What happens? How can you improve the code so that it will run correctly and portably, without making an unnecessary assumptions about implementation-dependent behavior? (Note there is more than one possible correct fix).The size_t
type is unsigned. Many stdlib functions force size_t
into your code (e.g. it is the return type of sizeof/strlen, type of third parameter to strncmp and memcpy), so you'll need to be vigilant about avoiding the problems that come from mixing signed/unsigned. For example: consider the code below:
if (sizeof(a) > sizeof(b)) ... // works fine
if ((sizeof(a) - sizeof(b)) > 0) ... // BUGGY! why?
for (size_t pos = strlen(a)-1; pos >= 0; pos--) // BUGGY! why?
The first two lines appear to be the same expression just rearranged, yet the first one works as intended, and the second doesn't. What's the difference? The for loop is supposed to loop backwards over a string--- what goes wrong with this code and how could you fix it?
Malware often exploits code vulnerabilities related to integer overflow and signed/unsigned conversion. The buggy function copy
shows the skeleton of a function that can be exploited in this way. The function is supposed to copy data of a particular length into a stack buffer and despite its precautions to not overflow that buffer, it is possible for an evil client to make a call that does exactly that. How? (Pathetic or hilarious? You decide: bogus attempt to avoid integer overflow)
contains.c
program has a contains
function which returns a boolean result of whether a pattern is contained in a document. As written, contains
will fail to report a match iff the pattern happens to occur in a very particular place within the document. The client code in the program is set up to trigger the bug. Run under gdb and study what's happening. What's the problem and what's the factor that makes it work or not? What simple fix is needed? This is a simplified version of an actual bug that had me convinced I was going crazy. The "Find" operation in the editor I was developing worked almost all the time, but super-rarely it would fail to find a particular occurrence (yet finding all others without problems). Note that the function gets a compiler warning, which is a helpful clue (yet another reason to never ignore warnings...) To show I'm not the only one who fell into this pitfall--- a recent security vulnerability in MySql due to same type of underlying bug.Before you leave, be sure to submit your checkoff sheet (in the browser) and have lab TA come by and approve it so you will be properly credited for lab. If you don't finish all the exercises in the lab period, we encourage you to work through any remainder on your own until your knowledge of ints is completely rock solid. Double-check your progress with self check.