Lab 3: Bits, bytes, and ints

Lab sessions Mon Jan 30 to Thu Feb 02

Lab written by Julie Zelenski

Learning goals

During this lab you will:

  1. write code that manipulates raw memory using void*, memcpy/memmove, and function pointers
  2. investigate the bits and bytes underlying the primitive data types and experiment with bitwise manipulation
  3. make observations about the behavior of integers and integer operations/arithmetic
  4. relate integer values to their bit patterns in a two's complement representation

Find an open computer and somebody new to sit with. Introduce yourself and regale each other with war stories of vanquishing your void* fears and bugs last week.

Lab exercises

1. Get started

Get the starter using the command

hg clone /afs/ir/class/cs107/repos/lab3/shared lab3

This creates the lab3 directory which contains source files and a Makefile.

Pull up the online lab checkoff and have it open in a browser so you'll be able to jot down things as you go. At the end of the lab period, have the TA check off your work.

2. Writing generic operations.

The generic.c file is missing the implementation of the substitute and has_duplicates functions. The first is a find-and-replace operation and the second searches an array for repeated elements. Both operate on generic arrays and use a client callback function. Read the function comments in the file for details and implement the functions to match their specifications. The program contains client code you can use for testing. Writing these is a great warmup for Assignment 3 which focuses on the implementation of a void* interface!

3. Data representation.

Compile the data program and run it under gdb. Break in main and step through initializing and printing the array. Now you're ready to start nosing around. The gdb command for the day is x, which examines the raw contents of memory at an address. It accepts format modifiers to control the range of bytes to examine and how to show the data. Within gdb, ask for help x to see the available options. Below are some expressions to evaluate. Try to predict what will be printed, then type into gdb and confirm your understanding.

(gdb) x/1wd &arr[0]      examine memory for first elem, format /1wx will read 1 word and print value in decimal format
(gdb) x/1wt arr          format /1wt 1 word in binary (note arr is synonym for &arr[0])
(gdb) x/1wx arr          /1wx 1 word in hex 
(gdb) x/4bx arr          /4bx 4 bytes in hex (what is difference between this and previous?)

x is also handy for dumping a sequence of array elements:

(gdb) x/9wd arr + 2      /9wd 9 words, each decimal (prints arr[2] and upward, past end of array even)
(gdb) x/12bc argv[0]     /12bc 12 bytes, each char
(gdb) x/2qa argv         /2qa 2 quadwords, each address

You can use x to re-interpret memory into a specified representation. No conversion is done, raw bits are read and shown through the lens of a different interpretation. This is akin to what happens at runtime when applying a typecast.

(gdb) x/1wf arr          interpret int bits as if they were float bits
(gdb) x/1wd argv[0]      interpret first chars of string as if int

Answer the following questions by experimenting with x within gdb:

An important take-away is realizing that bits are just bits, and only with the proper interpretation do bits have meaning. When working within the type system, there are no surprises-- the compiler uses the appropriate interpretation for the data to match its declared type. But when using void* or introducing typecasts, you're renouncing the compiler's help and you must be vigilant about tracking the types yourself. If using a void* interface (such as CVector or qsort) and intending to pass char* but mistakenly passing an int* or char** instead, there is no type-checking (neither compile-time or run-time) that will spot this error, the bits are silently mis-interpreted to produce strange and perhaps catastrophic results.

4. Simple behavior of integer types

Compile and run the ints program and use its results to answer the following questions. When asked "what happens", consider: is there a compiler warning? compiler error? runtime error? unpredictable result?

5. Integer bit patterns

The integer-family types are stored as a binary polynomial using two's complement notation. Here are some questions to work through to understand their bit patterns. You can write code to test, experiment in gdb, or try out this nifty online visualizer (select from dropdown menu to view as signed 32-bit integers to match our system).

Once you understand the bitwise operations and how the int bits are laid out, you can put them together to do some neat tricks. Try implementing the function sum_overflows function in ints.c that is given two ints and determines if their sum overflows. First, consider what must be true about the signs of the operands/sum if overflow occurred. Add code that use bitwise ops to extract the sign bit from the two operands and their sum and examines those sign bits to determine whether the sum overflowed. (Overflow may seem unthinkable, but it happens! In the news: Boeing 787 bug, the overflow bug in nearly every bsearch and mergesort, and Korean polo pony overflow breaks Youtube).

6. Testing jam!

CS107 has been giving you great practice with building up your testing prowess, but this next assignment will push those skills to the limit. You'll be implementing your version of the venerable CVector and CMap and one of the challenges is getting adequate test coverage into the nooks and crannies of all operations. The assign3 starter contains some simple client test programs and you have your client code from assignment 2 as well. You should be proud of your implementation if it fares well on these tests, but they alone are not enough. Figuring out what's not explored and poking your testing nose into those dark corners is the only sure means to uncover lurking issues, fix them, and submit a finished implementation you are confident can weather anything!

Let's invest the last bit of lab time in a group brainstorming session on a testing plan for CVector. Start by enumerating features from the cvector.h interface and skim the existing client programs to identify what operations are exercised and where. You're looking to see which operations are called, which variations, and in which combinations/contexts. Are there any operations entirely missing? Are the full range of varied invocations tested for a given operation? While there is only a single way to call cvec_append, cvec_insert could be used to insert at beginning, end, or middle, as well as attempt to insert out of bounds, all of which can be tested. What are some of the interacting features (e.g. cleanup function's use in replace) that ought to be specifically targeted? Together create an action list of the missing tests to provide additional coverage and rigor. Be sure to take this list home with you and put it to good use when working on the assignment!

7. Mixed-up types (optional)

This is a previous lab exercise that we shuffled out to make space for the testing jam which we felt was a better use of group time. It's here as material you can use on your own to explore the issues around the co-mingling of signed and unsigned types. The code in the mixed.c program shows some common pitfalls to check out.

Check off with TA

Before you leave, be sure to submit your checkoff sheet (in the browser) and have lab TA come by and approve it so you will be properly credited for lab. If you don't finish all the exercises in the lab period, we encourage you to work through any remainder on your own until your knowledge of ints is completely rock solid. Double-check your progress with self check.

http://dilbert.com/strip/1992-09-08

Contents