Assignment 4: A bit of fun

Due: Mon Feb 20 11:59 pm
Ontime bonus 5%. Grace period for late submissions until Wed Feb 22 11:59 pm

Assignment written by Julie Zelenski
Thanks to John Regehr (Utah) for saturating arithmetic

Learning goals

During this assignment, you will gain practice in

  1. using the C bitwise operators to manipulate binary data
  2. dissecting the bitwise data representation of various types (ints, floats, instructions)
  3. working within/around the limits of integer and floating point representation

Walkthrough videos

Problems 1 and 2 of this assignment have videos available to get you started (must be enrolled to access):

The assignment

Your next assignment consists of a programming "problem set". Our most recent material spans a diverse set of topics that don't easily lend themselves to being aggregated into one combined program, so instead this is a set of small exercises that individually explore topics such as bitwise operators, bit vectors, integer arithmetic, floating point representation, and instruction encoding.

For each exercise below, you are asked to write a function or two to accomplish a specific task or do a lab-like exploration on a particular topic. The code you're working with may be short, but it is mighty-- working through these exercises will really help solidify your understanding of bits, bytes, and numbers.

1. 2 bits, 4 bits, 8 bits, a short...

Your first task is to use bitwise operations to compare the count of "on" bits between two signed integers. The cmp_bits function is given two integer values a and b. It returns a negative result if the bitwise representation of a has fewer "on" bits than b, a positive result if a has more "on" than b, and zero if both have the same number of "on" bits.

int cmp_bits(int a, int b)

2. Sudoku sets

Treating an unsigned type as a bit vector can compactly represent and efficiently manipulate a set over a given range. Bit vectors are used in time and space-constrained situations to provide dense storage and allow fast manipulation of the entire group of bits at once. In a bit vector, each individual bit is designated to represent a particular value and a bit is turned on to indicate its corresponding value is contained in the set. In this problem, we have sets drawn from the range [1-9]. A set is represented using an unsigned short (16 bits). The bit at the nth position (counting from least significant bit at the 0th position) is designated to represent the value n. The bits in the positions 1 to 9 represent the set membership and the other seven bits of the short are ignored. Below is a few example sets and their corresponding bit vectors. The bit positions 1 to 9 are shown in blue to distinguish them from the unused bits around them.

0000000000000000           // {}
0000001010100100           // {2 5 7 9}
0000000000001110           // {1 2 3}

Write the make_set function to create a bit vector set from an array. The arguments to the function are an array of integer values and the array count. The function returns an unsigned short that represents a bit vector set of the values from the array. The bits in positions 1-9 of the returned result mark the set membership, the remaining bits are zeros. You may assume that no value in the array will be outside the range 1-9.

unsigned short make_set(int values[], int nvalues)

Now write the is_single function to manipulate these bit vector sets in efficiently computing a result needed when solving a Sudoku puzzle. The goal of Sudoku is to assign every cell in the grid a digit from 1 to 9 subject to the constraint that each digit can be used only once in each row, column, and block. (You can read about Sudoku in wikipedia if you're curious, but for the purposes of this problem, you don't need to know any more details). The is_single function is given three well-formed bit vector sets representing the digits already used in the row, column, and block for a cell. ("well-formed" means that the short will have bits on only in positions 1-9, remaining bits are zeros). The possibilities for a cell consist of only those digits that are unused; any digit that is already used in the row, column, or block is not a possibility. The function is_single returns true if the already used digits force only one option for this cell (number of possibilities is exactly 1) and false otherwise (i.e. number of possibilities is 0 or 2 or more).

bool is_single(unsigned short used_in_row, unsigned short used_in_col, unsigned short used_in_block)

The ideal implementation of is_single builds on direct and efficient use of bitwise operators (~ & ^ | << >>) with very limited use of the integer arithmetic/relational operators and is expressed in straight-line code (i.e., no loops, no if/switch, no chain of essentially same operation repeated 8-9 times and joined with some connective operator). If you can't quite achieve the ideal, a working solution that is unnecessarily awkward/inefficient can still earn most of the credit.

3. Saturating arithmetic

Saturating arithmetic operations are those where results that overflow or underflow "stick" at the maximum or minimum representable result instead of wrapping around like the usual modular arithmetic operations. If you are using saturating addition and subtraction on signed integers, (INT_MAX + 1) == INT_MAX and (INT_MIN - 1) == INT_MIN. Saturating arithmetic is a common feature in 3D graphics and digital signal processing applications. A saturating addition function takes values a and b and returns their sum a+b if it doesn't overflow/underflow the representable range or otherwise returns the appropriate "sticky" value. You are to write two variants of saturating addition: one that operates on unsigned values, another on signed.

stype sat_add_signed(stype a, stype b)
utype sat_add_unsigned(utype a, utype b)

The functions are written in terms of stype/utype which are placeholders that can be filled by any integer family type. The bits.h header file has a #define that controls which type from the integer family (char, short, int, long, or long long) is chosen for stype/utype. Your implementation of the saturating operators must work independently of the bitwidth, using a single unified sequence of operations that works correctly for all. This portability requirement will lead you to think through how promotion/truncation operates between the integer-family types.

You are required to execute the exact same code for all bitwidths, which means you must not use any sort of if/switch that divides into cases for the different bitwidths. This doesn't mean that you can't use conditional logic (such as to separately handle overflow or non-overflow cases), but conditionals that dispatch based on the type/sizeof of operands are totally off-limits.

4. Get to the (floating) point already!

In order to reason about floating point behavior, it's critical to start with an understanding of the underlying representation. This next task is a lab-like exercise to help you get a handle on the mechanics of floats: how the float bits are used for sign/exponent/significand, which values are representable and which aren't, the gaps in the floating point number line, normalized versus denormalized, and so on. Start by adding a couple of terms to your vocabulary: binade and epsilon. A binade is the set of numbers that all have the same binary exponent. All representable float values in the interval [2n, 2n+1) for some value of n form one binade. The epsilon of a given float f is the distance from f to the next larger magnitude representable float (i.e. its neighbor on the floating point number line moving away from zero). All float values within one binade have the same epsilon.

Below is a list of interesting floats we selected for you to consider. From each description, work out what its 32-bit float is, diagram its float bits, compute its epsilon as a power of 2, express its value as a sum/difference of powers-of-2 and then show the value in decimal. As an example, consider the float f = 1.75. Its float bits are 0 01111111 11000000000000000000000, its value expressed as sum of powers-of-two is 20 + 2-1 + 2-2 (or as difference 21 - 2-2), its epsilon is 2-23, and its decimal value is 1.75. When writing a value using powers-of-2, either sum or difference of powers works; choose the one that results in the more compact sequence.

Here are the floats you are to consider: (Note that constant values below are expressed in decimal, i.e. -100.0 is negative one hundred, .01 is one one-hundredth)

The point of this exercise is to strengthen your facility with bits and how they are used for representing floats. We strongly recommend you first work out the bits/powers-of-2 using pencil-and-paper. Then verify your results by using code/gdb and get the decimal value via printf("%.9g"). Edit the template given in floats.txt to provide your answers and submit the completed file with your repo.

5. Some disassembly required

Your final task is to pick apart a binary-encoded x86-64 machine instruction and print its human-readable assembly equivalent. This is the job of a disassembler (the inverse of the assembler), such as the objdump tool or the gdb disassemble command. The instructions you must decode are the five variants of the pushq instruction shown in the table below.

Disassembled instruction Variant (what is pushed?) Binary-encoded machine instruction (num bytes in parens) In hex
pushq $0x3f10 immediate constant 01101000 00010000 00111111 00000000 00000000 (5) 68 10 3f 00 00
pushq %rbx register 01010011 (1) 53
pushq (%rdx) indirect 11111111 00110010 (2) ff 32
pushq 0x8(%rax) indirect with displacement 11111111 01110000 00001000 (3) ff 70 08
pushq 0xff(%rbp,%rcx,4) indirect with displacement and scaled-index 11111111 01110100 10001101 11111111 (4) ff 74 8d ff

x86-64 uses a variable-length encoding for machine instructions; some instructions are encoded as a single byte, and others require more than a dozen bytes. The pushq instructions to be decoded vary from 1 to 5 bytes in length. The first byte of a machine instruction contains the opcode which indicates the type of instruction. The opcode is followed by 0 or more additional bytes that encode additional details about the instruction such as the operands, register selector, displacement and so on. This additional bytes vary based on the particular instruction variant.

The table above shows various pushq assembly instructions each with its binary-encoded machine equivalent. To interpret the binary encoding:

void disassemble(const unsigned char *raw_instr)

The disassemble function takes a pointer to a sequence of raw bytes that represent a single machine instruction. How many bytes are used by the instruction is figured out during disassembling. The idea is to read the first byte(s) and use the opcode and variant to determine how many additional bytes need to be examined. Call our provided helper function print_hex_bytes to print the raw bytes, then use bitwise manipulation to extract and convert the operands and print the assembly language instruction. For constants (immediate or displacement), use the printf format %#x to show hex digits prefixed with 0x and no leading zeros.

As an example, if the disassemble function were passed a pointer to the bytes for the final instruction in the table above, it would print this line of output:

ff 74 8d ff    pushq 0xff(%rbp,%rcx,4)

We will only test your function on well-formed instructions of the five pushq variants listed above. There is no requirement that you gracefully handle any other inputs, such as other variants of push or other instruction types or malformed inputs. For slightly obscure reasons, there is a slight anomaly in the encoding used when %rbp or %rsp is the base register in indirect/indirect-with-displacement or %rsp as the index register for scaled-index. You do not need to allow for this nor make a special case for it, just disassemble as though the standard encoding is used for all registers without exceptions.

Requirements

Grading

Read our page on how assignments are graded for background information on our grading process and policies. Here's a plan for a rough point breakdown.

Functionality (60 points)

Code quality (buckets weighted to contribute ~15 points)

We will prefer solutions that are direct, efficient, and readable to those that sacrifice one of these qualities for the others.

Submissions received by the assignment due date will receive an on-time bonus equal to 5% of the functionality points earned by the submission.

Getting started

Check out the starter project from your cs107 repo using the command

hg clone /afs/ir/class/cs107/repos/assign4/$USER assign4

Note that these exercises are similar in spirit to the partnered explorations you do in lab, but the collaboration rules for assignments are much more restrictive than for lab. It is allowable to discuss high-level concepts with your peers, but these discussions must not descend into algorithmic details and code-level specifics. When working on an assignment, you should be doing your own independent thinking, design, coding, and debugging.

Finishing

The sanity check makes some simple calls to each function and verifies that the output printed by your disassemble operations is in the proper format for automated testing. Be sure that all other print statements in the required functions are removed/disabled. Any extraneous output can interfere with the autotester and cause test failures.

When finished, use submit to turn in your code for grading.

Keep tabs on how your knowledge is progressing with our post-task self check.



Contents