Assignment 4 advice

Written by Julie Zelenski

Warmups

Before you dive in, it might help to work through some thought exercises to bring our your inner bitwise ninja. Answering these questions is not required, nor to be submitted. You are encouraged to freely discuss these warmup questions and answers with your peers (in person, on forum) to help everyone reach a solid foundation on the basic concepts before starting.

You also may find it helpful to review the bits/int/float exercises from labs 3 and 4, especially if you left tasks unfinished or feel you still have gaps in your understanding.

General advice on working with bits

Advice on testing

Program facts

Frequently asked questions about assign4

What is allowed/disallowed when writing the cmp_bits function?

A straightforward approach is to write a helper to counts the "on" bits in an integer, call the helper to get the two counts, and compare them. The helper that counts bits might use a loop to individually mask out each bit and increment a counter. This strategy is perfectly acceptable. There are alternative approaches that don't rely on counting, but instead take advantage of the bitwise properties. For example, it is possible to quickly eliminate the bits in common since they don't change the comparison result. You can also streamline by processing both integers in a single loop and stopping as soon as you can determine the result. We encourage you to play around with how you can streamline, as this exploration will strengthen your understanding of bitwise manipulation, but we will accept any simple, reasonably efficient implementation (such as the dual bit-count and subtract). One entirely off-limits function is the gcc builtin_popcount. You might find it a useful testing tool to compare against your own counter, but will receive no credit for using it to implement bit counting.

Optional extracurricular diversion: Algorithms for fast bit-counting are a perennial interview question and there are many fancy solutions. A Google search will turn up many discussions explaining various approaches and arguing their merits. These algorithms are also O(N), but reduce the constant on the linear term. These clever ideas make interesting reading and we encourage your exploration but only after you've finished your own implementation, otherwise, is easy to be overly lead by the work of others and compromise the opportunity to do your own thinking and coding. The CS107 policy is that code you submit (for this or any assignment) should be something you wrote yourself, not copied or adapted from the code/pseudocode of others.

Can you further explain what is meant by the "ideal" approach for Sudoku?

The ideal solution has no loops, no if, no switch, nor any long chain of connectives (no sequence computing expr1 op expr2 op expr3 ... that joins 8 or 9 near-identical expressions). Instead it gathers the sets together in such a way to compute the boolean result via one combined test. For most approaches, the crux of is_single will come down to efficiently determining if a single bit is on. A simple tactic is to count "on" bits and compare the count to 1. (Even unify code with cmp_bits perhaps) This is one of the inefficient/looping approaches that can work but isn't the most direct. Instead think about which bit patterns have a single "on" bit. What do they represent numerically? Is there anything interesting about those numbers and their relationship to other nearby numbers that could help you identify whether you have one of those numbers or not?

What is allowed/disallowed when writing the saturating functions?

You may use any of the integer arithmetic, relational, and bitwise operators in your solution. There must be no conditional that branches based on bitwidth of type (e.g. no switch/if that divides into different code paths based on size/type). You may use sizeof on stype/utype in an expression not used as a conditional. Consider it a challenge to find the slick solution that doesn't even use sizeof at all!

Are the saturating utype and stype guaranteed to be the same bitwidth?

Yes. The typedefs are done as a pair, using the same bitwidth and only differing in signedness. utype is always an unsigned version of stype.

What are the min/max sticky values for stype/utype?

The bitwise representation for utype is the binary polynomial and stype use two's complement representation. The bit patterns for the minimum and maximum value follow from the bitwise representation. If you need help understanding the range for given type, the <limits.h> header file #defines INT_MIN/INT_MAX/UINT_MAX and so on for char/short/etc. These constants can't be directly used in your saturating operations as they are bitwidth-specific, but you can examine those values to discern their bitwise patterns. The range of min/max can differ across system, so be sure you looking at the values from our system.

Are there any differences in the bitwise operations when applied to signed versus unsigned values?

The bitwise operations & | ^ ~ << operate identically on signed versus unsigned types. The right shift operator >> comes in two flavors: logical and arithmetic. The logical shift is used for unsigned types, arithmetic for signed. A logical right shift zero-fills the bits in the leftmost positions that were vacated by the shift to right. The arithmetic right shift copies the previously leftmost bit to fill the vacated bits (thus replicating the sign bit). (Side note: The C standard leaves unspecified whether a signed right shift is arithmetic or logical, our gcc compiler uses arithmetic, but you could run into a different behavior on another compiler.)

What bitwidth and signedness does a numeric integer constant have? What about a floating point constant?

An integer literal is signed int by default. Suffixes can be added to the constant to dictate its type. A suffix of L indicates the type is long, LL is for long long, and U (either by itself or before L or LL) indicates it is unsigned. For floating point constants default to double. The suffix f indicates the type is float.

What are the rules for integer promotion in C?

When assigning from narrower bitwidth (e.g. char, short) to a wider, the source bytes are copied to the low-order bytes of the destination, and the upper bytes are assigned based on the promotion rules. If the source type was unsigned, the upper bytes of destination are zero-filled. If the source type was signed, its sign bit (the msb) is replicated across the upper bytes of destination (this is called sign extension). It may help to trace through this and confirm for yourself how/why this is value-preserving. If an operation is performed on two operands of mixed type, integer promotion is implicitly applied to the narrower first. It also the case (and a bit surprising, too) that integer promotion is also applied to types narrower than int when an operation is performed on them, even if no other wider type is involved. For example, the bitshift and invert operations promote the operand to an int, so bit-shifting a char will promote to an int before shifting, bit-inverting of a char similarly promotes, and the default result type of these expressions will be int. If the bitwidth of the operand was larger than int, the bitwidth will be unchanged by these promotion rules.

How can I test different bitwidths for the saturating functions?

The bitwidth of stype/utype is determined by the #define SAT_CHOICE at the top of bits.h. In the starter code, it is configured for char bitwidth. Edit bits.h to change SAT_CHOICE (read the comments in header file for available bitwidths), re-compile, and you can now test on that bitwidth. Even more convenient than editing the #define, the Makefile has additional targets (char, short, int, long and longlong) that configure the #define and rebuild bitstest. For example, make char will temporarily configure the SAT_CHOICE definition to char bitwdth and rebuild bitstest with that setting, no edits to the header file needed.

How can I verify the output of disassemble?

Create a raw instruction by assigning the desired bytes in an array of unsigned char and passing that array to disassemble:

unsigned char bytes[] = {0xff, 0x32};
disassemble(bytes);

You can use gdb to disassemble the same bytes as a comparison for testing. The x/i bytes command in gdb will show the bytes interpreted as an instruction.

Can you explain more about the anomaly in the push instruction encoding?

If the second opcode byte is 01110100, this bit pattern matches both the opcode for the scaled-index subtype and indirect-with-displacement on register %rsp. Which is it? The explanation is that %rsp is disallowed as the base in an push indirect-with-displacement because its register code is overloaded to use as the marker for the push scaled-index subtype. The ISA designers had to sacrifice one register to do this and chose the register they deemed least likely to be used. This means %rsp as base for indirect-with-displacement has to be encoded using the longer scaled-index form. There is a somewhat similar overlay on the push indirect with %rsp or %rbp as base (but the interference is with some other push variants we aren't asking you to disassemble). This choice by the designers avoids consuming another bit of encoding for most cases, while making one special case longer. (oh, the joys of CISC architectures...) For the purposes of the assignment, just ignore the possibility of the base being %rsp or %rbp for indirect or indirect-with-displacement. This avoids the ambiguity and you won't need to make a special case out of anything.

How do I display values in binary?

In gdb, use the t format with print, examine, or display to see the binary representation. printf does not have a binary format, but it does print in hex which is the next best thing.

In gdb, when I call a function that returns a float/double the result is garbled. Huh?

You can invoke function calls at the gdb prompt and it will make a valiant effort to invoke the function and print its result. This is pretty handy for working more interactively/experimentally. But be warned that the mechanism can be a little squirrely. For example, if the called function takes or returns variables of float/double type, you may get garbled results. The issue has to do with gdb not having access to the complete type information and doing a bit of guessing. You can add a typecast on the function being called (to its proper type, of course) to force the right interpretation. For example:

    (gdb) p fabs(-3.5)
    $1 = 1                                       // huh ??? why is result from fabs so wrong?
    (gdb) p ((double (*)(double))fabs)(-3.5)     // tell gdb fabs takes double, returns double
     $2 = 3.5

Consider this the one exception to the rule that you should never ever cast a function pointer. :-)

Can I edit the code in bitstest.c or average.c for my own testing? Do I need to restore the original before submit?

Yes, you can edit, and no, you don't need to restore. When we grade, we will be using our client test program.

Rabbit

                      A rabbit
                           bit
                  A little bit
                   An itty-bitty
                    Little bit of beet.
                      Then bit
                        By bit
                        He bit
                                Because he liked the taste of it.
               But when he bit
                     A wee bit more,
               It was more bitter than before.
             "This beet is bitter!"
                        Rabbit cried.
                 "I feel a bit unwell inside!"
               But when he bit
        Another bite, that bit of beet
Seemed quite all right.
                                Besides
When all is said and done,
                    Better bitter beet
                Than none.

                                              -- Mary Ann Hoberman