Lab 1: Bits, Bytes, and Integers

Lab sessions Tue Oct 01 to Thu Oct 03

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Lab Overview

Your weekly lab is a chance to experiment and explore, ask and answer questions, and get hands-on practice in a supported environment. We provide a set of lab exercises that revisit topics from recent lectures/readings and prepare you to succeed at the upcoming assignment.

Lab is collaborative! We're all in this together! You will pair up and work as a team on the exercises. The entire room is one learning community working together to advance the knowledge and mastery of everyone. Stuck on an issue? Ask for help. Have an insight? Please share! The TA will circulate to offer advice and answers and keep everyone progressing smoothly.

To track lab participation, we have an online checkoff form for you to fill out as you work. Lab is not a race to find answers to exactly and only the checkoff questions-- the checkoff questions are used only to record attendance and get a read on how far you got. Lab credit is awarded based on your sincere participation for the full lab period. Your other rewards for investing in lab are to further practice your skills, work together to resolve open questions, satisfy your curiosity, and reach a place of understanding and mastery. The combination of active exploration, give and take with your peers, and the guidance of the TA makes lab time awesome. We hope you enjoy it!

For lab each week, we plan to mine our favorite open-source projects (musl libc, BusyBox unix utilities, Apple, Google and more) for interesting systems code to use as an object of study. We use this code to learn how various programming techniques are used in context, give insight into industry best practices, and provide opportunities for reflection and critique.

We will have some noteworthy code picked out for you to explore in each lab. For lab1, the chosen code passages highlight interesting uses of the bitwise and integer operations.

Learning Goals

During this lab you will:

practice with bits, bitwise operators and bitmasks
read and analyze C code that manipulates bits/ints
further practice with the edit-compile-test-debug cycle in the Unix environment

Find an open computer to share with a partner and introduce yourselves. Together the two of you will tackle the exercises below.

Get Started

Clone the lab starter code by using the command below. This command creates a lab1 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab1/shared lab1

Next, pull up the online lab checkoff and have it open in a browser so you can jot things down as you go. Only one checkoff needs to submitted for both you and your partner.

Exercises

Let's kick things off with a little Unix love! Chat up your labmates about your assign0 experiences. How is everyone doing so far on getting comfortable in the unix environment? Is there a video or resource you want to recommend to others? Do you have open questions or an issue you'd like help with? Did you learn a nifty trick or tip that you'd like share? Let's hear it!

2) Bitwise Practice (35 min)

This section provides practice you can work through to get more familiar with bit operators and bitmasks. But first, one additional set of bit operators we did not have time to cover in class are the shift operators. These let you shift bits to the right or left:

// shifts x to the left by k bits
// lower order bits filled with zeros
// bits shifted off the end are lost
x << k;

// shifts x to the right by k bits
// for unsigned numbers, fill with zeros
// for signed numbers, fill with sign bit
x >> k;

Mathematically, this lets us easily multiply or divide by two. For instance, for a left shift, 0b01110111 << 2 results in 0b11011100. For a right shift, filling with zeros for an unsigned number is called a "logical shift" and filling with the sign bit for a signed number is called an "arithmetic shift". Try out an example (e.g. -1) to see why filling with the sign bit is preferred over always filling with zeros or ones).

A few other miscellaneous notes about bit operations:

operator precedence with bit operators and other operators can be tricky. Always use parentheses where precedence is ambiguous just to make sure operators execute in the order you expect. For instance, 1<<2 + 3<<4 means 1 << (2+3) << 4 due to precedence rules. Writing (1<<2) + (3<<4) ensures the correct order.
put a U after a number literal to make it unsigned. For instance, 1U means the literal 1 as an unsigned number.
put an L after a number literal to make it a long (64 bits) instead of an int, which it is by default. This highlights a common issue! If you want, for instance, a long with the index-32 bit on and everything else off, the following does not work:

long num = 1 << 32;

This is because the 1 is by default a signed int, and you cannot shift a signed int by 32 spaces because it has only 32 bits. Instead, you must specify that the literal be a long:

long num = 1L << 32;

(As a side note, lecture slides 71-81, which we didn't have time to get to this past lecture, also document everything mentioned here).

With this material and the other material from the past lectures, test your understanding with this page of bitwise practice problems.

3) Round Up (3 + 4 together about 40 min)

Open the round.c file to review the code for the functions is_power_of_2 and round_up.

is_power_of_2 is a function that takes advantage of a unique property of powers of two at the bit level. Work with your partner to identify what is unique about the bitwise pattern for those numbers that are a power of 2. Try sketching out a few examples. It may be easier to identify this pattern if you think about the binary representation of a number as telling you which powers of two make up the number. Once you've identified the pattern, then consider the relationship between a power of two and its predecessor (e.g. number - 1) in terms of the bits the two values have in common. How does the code leverage these two facts to efficiently determine that a given value is or is not a power of 2?
The round_up function returns the value of the first argument rounded up to the nearest multiple of the second. First consider the general case, when the multiple is not a power of 2. How are the arithmetic operations used to round up to the next multiple? Now consider the special case when the multiple is a power of 2. How is bitwise manipulation able to take the place of the expensive multiply/divide?

These functions show the advantage of being able to flip between interpretations. A number is just a bit pattern and can be manipulated arithmetically or bitwise at your convenience.

4) Midpoint

For this exercise, start by reading Google researcher Joshua Bloch's newsflash about a ubiquitous bug from integer overflow. Who knew that simply computing the midpoint could be perilous?

We want a function that safely and correctly computes the midpoint of two integers. If the true midpoint is not an exact integral value, we are not fussy about how the result is rounded. As long as the function returns either integer neighbor, we're happy.

The file mid.c contains four different formulations to compute the midpoint in a simple program you can experiment with.

The midpoint_original function mostly works, but exhibits the bug called out in Bloch's article:

int midpoint_original(int x, int y) {
    return (x + y)/2;
}

If the sum x + y overflows, the result is erroneous. For example, the call midpoint_original(INT_MAX-2, INT_MAX) should return INT_MAX-1, but actually returns -2. Oops! In the original context, the two inputs to midpoint were array indexes, which perhaps explains how this bug was able to lurk for so long with no visible symptoms (i.e. arrays of yore rarely had such large dimensions).

Bloch's article proposes some fixes for calculating the midpoint. First he offers midpoint_A and then champions midpoint_B as a faster alternative.

int midpoint_A(int x, int y) {
    return x + ((y - x) / 2);
}

int midpoint_B(int x, int y) {
    return ((unsigned int)x + (unsigned int)y) >> 1;
}

Consider how midpoint_A and midpoint_B have rearranged the original calculation. What has midpoint_A done to avoid overflow? What has midpoint_B done?

Both midpoint_A and midpoint_B work correctly as long as both inputs are non-negative. This constraint is never actually stated, it is merely implicit in the original context that the inputs are array indexes. So if one or both inputs is negative, what then? Let's investigate!

Given a negative input, midpoint_A is susceptible to a different overflow than the original, in this case during subtraction. What must be true about the two operands to subtract in order to cause overflow? What is the result of a subtraction operation that has overflowed? Work out what you think is an input that causes a failure for midpoint_A, then test out your theory by editing the mid.c program to try it. Build and run the program to verify your understanding.
midpoint_B has its own distinct problem with negative inputs. To expose the flaw in midpoint_B it helps to work backward. Consider how the expression within midpoint_B will never evaluate to a negative result -- why not? Given this fact, any inputs for which the midpoint is negative must fail on midpoint_B. What value will be returned instead of the expected in these cases? Edit the mid.c program to add an input that demonstrates the problem and verify that your theory matches the observed behavior. If you cast the sum back to a signed value before shifting right, you fix this particular problem, but at the expense of causing a different case to now fail. Experiment with the code to observe this. It feels like we just can't win!

The final version of midpoint we have for you is midpoint_C. This gem comes from the marvelous fascicle on bits written by the extraordinary Don Knuth. (fascicle??).

int midpoint_C(int x, int y) {
    return (x & y) + ((x ^ y) >> 1);
}

Knuth's midpoint_C is the real deal, computing a correct midpoint for all inputs, with no possibility of overflow! How this approach works is not obvious at first glance, but with some careful study you can work through how it all fits together.

Start by considering the bitwise representation of a number and how its "on" bits correspond to the powers of 2 in the number's binary polynomial. For instance, 0000...01011, which is the base-10 number 11, can be written as 1*2^0 + 1*2^1 + 0*2^2 + 1*2^3 = 11.
Now trace the effect of comparing the powers of 2 contained in x and y. The bitwise & identifies which powers of 2 the two inputs have in common and the bitwise ^ pulls out those powers which differ.
How are those two results combined so as to gather the needed powers of 2 in the midpoint?
The code above joins with +. Why must we use + here? Try substituting | and work out how the resulting expression now operates. The result is nearly equivalent, but not quite -- what has changed?

And what exactly is a fascicle? It took me several hops through my dictionary to find out!

5) Write, Test, Debug, Repeat (20 min)

Now it's your turn to write some bitwise code of your own and practice with the Unix development tools!

The parity program reports the parity of its command-line argument. A value has odd parity if there is an odd number of "on" bits in the value, and even parity otherwise. Confirm your understanding of parity by running the samples/parity_soln program on various arguments.

The code in parity.c was written by your colleague who claimed it is "complete", but on their way out the door they mutter something unintelligible about unfixed bugs. Uh oh... Your task is to test and debug the program into a fully functional state using CS107 sanitycheck and the gdb debugger.

Let's investigate!

A helpful reference as you are using GDB is the CS107 GDB guide, also linked to from the resources page.

Use make to build the program and try running ./parity a few times on various values. Uh oh! It thinks every value has odd parity! Does the program ever report even parity for anything?
Let's run it under the debugger. Start gdb parity. We can use the list command to print out parts of the code GDB is examining. Use list compute_parity to print the compute_parity function and note the line number where it updates the result inside the loop.
Next, let's set a breakpoint on that line so that when we run the program in GDB, GDB will pause before executing that line and await further instructions. You can add a breakpoint by typing break XXX where XXX is either a function name or line number.
Run the program under gdb by entering the run command, followed by a command line argument (for the number to examine). GDB will start running the program and pause when it hits the breakpoint. Note that it pauses before it executes the line the breakpoint is on. When stopped at the breakpoint, print the value of result. We can print by using the p (short for print) command, followed by the name of what we want to print. Huh? The value of result appears to be garbage. D'oh, it was never initialized! Is that even legal? In a safety-conscious language such as Java, the compiler may guard against this.
Do a make clean and make to review the build warnings and you'll see nary a peep about it from gcc. At runtime, the variable will use whatever junk value was leftover in its memory location. Lesson learned -- you will need to up your own vigilance in the laissez-faire world of C.
Add a correct initialization, build, and re-run to test your fix.

Sanitycheck! One simple means to verify correctness is by comparing your results to a known-correct solution. We provide solution executables in the samples directory. For example, run ./parity 45 then run samples/parity_soln 45 and manually eyeball the outputs to confirm they match. Even better would be to capture those outputs and feed them to diff so the tools can do the work. To make testing as painless as possible for you, we've automated simple output-based comparison into the CS107 tool sanitycheck. If you haven't already, read our sanitycheck instructions, and run sanitycheck on lab1 to see that your fixed parity passes all tests.

Now that your corrected program passes sanitycheck, it is good to go, right? Not so fast, keep in mind that sanitycheck is only as thorough as its test cases. Our default sanitycheck gives you a small set of tests to start with; you use the custom option to add tests of your own to more fully exercise the program.

Carefully read through the results from default sanitycheck. How many different test cases does it include? What are those test cases?
Run custom sanitycheck with the additional test cases in custom_tests to get a different side of the story. One of these tests fails due to timeout. It's not that the program is horribly inefficient, it is just stuck in an infinite loop.
The best way to debug an infinite loop is to run the program under GDB, and once it stalls, stop the program using Control-c. GDB will show you where the program was executing when it was stopped, and you can poke around and see what is going on. Let's try this - run the parity program in GDB on a negative argument and let it go unresponsive.
Type Control-C to interrupt it and return control to gdb. Use the gdb backtrace command to see where the program is executing - this will display the current call stack, meaning what functions are currently executing. step through a few statements as it goes around the loop and gather information to diagnosis why the loop is not being properly exited.
Once you know what's gone astray, edit the code to fix, rebuild, and test under both default and custom sanitycheck to verify you have squashed the bug. Way to go!

[Optional] Challenge Problem

Finished with lab and itching to further exercise your bitwise skills? Check out our challenge problem!

Recap and Check Off With TA

At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission so you are properly credited for your work. It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab1 should be proficiency with bitwise operations, constructing and using bitmasks, and a solid grasp on the representation of unsigned values as a binary polynomial and signed values in two's complement. Here are some questions to verify your understanding and get you thinking further about these concepts:

Consider rounding the magnitude of an integer up to power of two (e.g. 3 rounds to 4, 4 to 4, 5 to 8, for negative: -3 rounds to -4, -4 to -4, and so on). How does the bit pattern of a positive int differ from the bit pattern of the value after rounding to power of two? What about for a negative int?
Give a bitwise expression to zero the upper N bits of an unsigned int V. What does this expression compute numerically?
When do you supply the command-line arguments to a program you wish to run under gdb: when starting gdb or from within gdb when running the program?
Chapter 2 of B&O has many excellent practice problems presented with solutions - check them out!

Just For Fun

a cute parlor trick based on integer representation
a surprisingly addictive binary Tetris game
crazy bit-hacks for the truly brave.
integer overflow wreaking worldwide havoc: grounding the Boeing Dreamliner