Assignment 3 advice

Written by Julie Zelenski

Here's our best advice on how to hit a home run with your CVector and CMap implementation!

Good design and wise use of memory/pointers

Incremental development & testing

Attempting to implement and test all of the operations simultaneously is sheer lunacy. One operation at a time! If your cmap_get isn't finding an entry, is it because cmap_get searches incorrectly, cmap_put didn't store the entry, or something else entirely? If you build one operation at a time and throughly test/debug before moving on, you have a reliable foundation to build on and won't be forced into trying to debug all the code all the time. Here is a suggested order of attack:

Testing is HUGE !
(You have read our page on effective software testing, right?) This assignment will require a more significant investment in testing than any previous assignment. The number of variables in play (settings for CVector/CMap when created, which operations are called in which order, etc.) create many diverse code paths to verify. If your testing passes through only a limited subset of those paths, then you can't be confident it will stand up to our rigorous scrutiny in grading. Be vigilant and thorough!

Program facts

Frequently asked questions about assign3

What is an incomplete type? How do I complete it?

Both CVector and CMap are declared as incomplete types in their respective header files:

typedef struct CVectorImplementation CVector;     // this is from cvector.h

The above typedef establishes the nickname CVector as a a synonym for the type struct CVectorImplementation. But nowhere in cvector.h is this struct defined! This makes CVector an incomplete type. It mentions a struct by name, but no further details are revealed here. An incomplete type limits the client to uses that don't depend on its internal structure. The client can declare/pass/return variables of CVector *, never CVector itself. The client cannot dereference a CVector*, access CVector fields, nor compute sizeof(CVector).

The issue is resolved in the implementation. At the top of cvector.c, complete the type with your definition of struct CVectorImplementation:

    struct CVectorImplementation {   // this type definition is at top of cvector.c
        int a, b;                    // include your desired fields in the struct
        char c;
    };                               // don't forget to end with semi-colon!

With this type definition, the true nature of CVector is revealed. The code in cvector.c can now use CVector as a regular struct type with no restrictions.

What fields should I declare in my CVector and CMap structs?

There is a mandate for the general layout (see diagrams in writeup), but the specific struct fields are yours to determine. One strong recommendation is that you choose good names. You use these fields repeatedly throughout your implementation and names that clearly communicate their purpose will be a great help. A vague name such as size_avail forces you to repeatedly ponder whether it tracks the total allocated or unused portion and whether size is expressed in number of bytes or number of elements.

How do I store a function pointer as a field in a struct?

Ordinary assignment operation works for function pointers, the only sticky part is the syntax to declare a field of the proper function pointer type. The function pointer typedefs in the .h file make this easier, just use the typedef name :

    struct CVectorImplementation {
        ...
        CleanupElemFn clean;        // field to store function pointer to client callback
    };

What data structure should I use for the CMap buckets?

The buckets are an indexed homogeneous collection. Although you could use a CVector, it provides little advantage since the number of buckets does not change, each bucket has a known type, and you don't need any of the fancy CVector operations.. You should instead use an ordinary C array; its use is simpler and provides more type-safety. Each bucket is a pointer to a linked list cell, that pointer is typed as a void*, and a dynamic array of such pointers would be a void**. Note that a void ** is not the same type as a void*. For example, consider these two declarations:

void *a, **b;
...
a[1] = *a;      // doesn't compile, need cast before you can access pointee
b[1] = *b;      // totally fine, copies pointer to second slot to first

a has an unknown pointee type (void) and requires a typecast to access the pointee. b has a known pointee type (void*) which can be accessed normally.

Can I define a struct for the CMap linked list cells?

No. Although the linked list cell is "struct-like" (i.e. a region in memory with three adjacent fields), two of its fields are not statically-sized, which makes it impossible to define a compile-time C-struct for it. Instead, you must manually create a blob of raw bytes that models the struct. You will allocate the correct-size memory region and manually dissect it using a combination of pointer arithmetic and typecasts. Defining your own helper functions to read/write each field will help immensely in keeping the complexity down.

CVector is all about the pointer arithmetic within an array, address_of_nth is its handy helper. For CMap, dissecting the cell/blob to access the fields is handled by helpers get_next, get_key, and get_value. Although needed less frequently, the setter functions set_next, set_key, set_value would also be welcome. Using the three setters in create_new_blob is the icing on the cake. Once you have perfected these helpers, the rest of your work becomes a whole lot easier.

How do I resolve the compiler warning pointer of type void * used in arithmetic?

The pointer arithmetic expression pointer + n implicitly scales n by the size of the pointee before adding it to pointer. If pointer is a void*, its pointee size is unknown, and the compiler will squawk because it cannot apply the proper scaling. To perform pointer arithmetic on a void* you must first cast it to indicate the desired scaling. The typical approach is to cast to char* to get no scaling and then manually multiply yourself.

Should I copy the function comments from cvector.h/cmap.h into the cvector.c/cmap.c?

No. The comments in cmap.h/cvector.h explain to a client how to use the container. Your comments in the .c file should instead explain to another developer how the the internals/code work.

How does iteration work for a CVector? In what order does it iterate?

Elements are iterated in order of increasing index. cvec_first returns the element at index 0 and each call to cvec_next returns the element at the next higher index. The implementation of cvec_next has to orient itself from the previous to identify the next in sequence. At first, it may seem a bit mystical how you can do this, but remember you are the omniscient implementer and can take advantage of your knowledge of how the data is laid out internally to root around in memory to locate where you are in the iteration and access what you need. It may be helpful to know you can subtract two addresses to get the distance between them.

How does iteration work for a CMap? In what order does it iterate?

Keys are iterated in arbitrary order. cmap_first can return any key in the CMap (e.g. no requirement that this key is alphabetically first or was entered first) and each call to cmap_next returns another key in the CMap, in no particular order. The easiest approach is to just iterate over the keys in the order they appear in the buckets. It can take a little thinking to see how you can orient from the previous key and access the key that follows, but you are allowed to take advantage of your privileged knowledge of the CMap internals as necessary.

The sample CMap adapts its capacity to load when created with capacity_hint 0. Must mine do that too?

No. In your version, the number of buckets is permanently configured at creation of the CMap and does not change. Performance will degrade when overloaded. The sample version operates as fixed-size when created with a non-zero capacity hint, so use that as a model for the expected decline in performance. The sample version rehashes when created with zero capacity hint, you are not expected to implement this particular behavior and should not use it as a model for your implementation.

What is the difference between memcpy and memmove?

Both are optimized routines for copying a region of raw bytes. memcpy can be more efficient that memmove but requires that the source and destination regions are distinct, whereas memmove correctly handles if the regions overlap. Either can be used to write an entire sequential block of a memory in one call, which can be much more efficient that repeated calls to work on the memory in smaller pieces.

What is assert? How do I use it?

The assert macro from <assert.h> is a mechanism for fatal errors. You assert a condition that must be true to continue:

    assert(num < 1000);

If the condition fails, assert prints the error and halts the program. Liberal use of asserts is a defensive programming practice used by a programmer to communicate with themselves (or other programmers) about fatal situations that should "never happen" and from which there is no possibility of recovery. For example, you can verify an allocation request was successful by asserting the result was non-NULL. A NULL result can mean that heap is exhausted or the internal malloc data structures have been damaged, in either case, there is no point in continuing. The alternative of blundering on with a NULL pointer will eventually lead to a crash, but far removed from the original error. Better to assert right away!

Can I raise asserts for additional error cases beyond those listed in the interface?

We want you to assert only the listed conditions. You might aspire to further bullet-proof, for example, rejecting a NULL CMap* or complaining if the client passes the address of an element that is the wrong type/size/level of indirection/already freed. Although we salute your intentions, it is impossible to make good on them. Determining whether a pointer is valid is not solvable in general. Testing ptr == NULL will catch exactly that one case, but doesn't detect all other invalid addresses. Any measure to detect bad pointers will be half-hearted at best. As the implementer, you have little choice but to assume the client will supply valid addresses and should write your code accordingly. Assert those conditions specifically listed and assume the rest. (That said, our solution CVector/CMap goes out of its way to include special-case defenses against certain detectable client misuses as a help to novice clients. You may have tripped these when working on assign2 or will observe when testing your CVector/CMap against the solution. These protections are not documented in the public interface and we do not want you to implement them.)

How do I add a private function to the implementation?

Declare the function in the .c file and qualify as static, do not add its prototype to the .h file. A static function is private and will not be visible/accessible outside the module in which it is defined.

How can I add new client test programs to the project?

The starter repo is configured for the given three client programs. To add a new client program of your own:

  1. Create a myprogram.c file with the desired client program.
  2. Use hg add myprogram.c to add the file to your repo.
  3. Edit the Makefile to add myprogram to the names listed for PROGRAMS.

Now myprogram is a part of your project and make will build it. If your assign2 was solid, you could copy your client programs into your assign3 directory and follow the above steps above to use them as additional test programs.

How can I compare my CVector/CMap to the sample implementation?

The Makefile includes a target to build solution versions of all test programs. The binky_soln executable is built from binky.c but substitutes the sample library implementation of CVector/CMap for yours. For example, you can build vectest_soln (which uses our CVector/CMap) and compare its behavior to vectest (which uses your CVector/CMap). You can make the solution version of a single program by name, e.g. make vectest_soln, or use make soln to build solution versions of all programs named in the Makefile.

Custom sanity check can be used to compare the output of a custom program with its solution version. Let's assume you have written your own test program mytest and followed the steps above to add mytest to repo/Makefile. Build the program and its solution using make mytest mytest_soln. You can then use mytest as the executable in a custom sanity test and it will compare the output from mytest to mytest_soln. Pretty cool!

How can I set gdb breakpoints in a different source file that the one containing main?

The command break 114 sets a breakpoint at line 114 in the most recently accessed source file. To choose a different file, prefix the line with the filename, e.g. break cmap.c:114. You can also set breakpoints on functions by name, e.g. b cvec_append which finds the named function in whichever file it is defined.

Are we permitted to share test cases with each other?

Our collaboration policy allows you to exchange testing plans, strategies, and ideas for test cases (no exchange of actual code though). Sharing these insights on the forum allows everyone to benefit, too! As always, collaboration with others should be properly cited.