Fill-in-the-blank void* warmup

Written by Julie Zelenski

For assignment 2, you will be a client of CVector and CMap. (If you haven't yet looked at the cvector.h/cmap.h files from the assign2 starter, now would be a good time!) Using void* pointers always requires careful coding and being a client of our generic collections is no exception. To prepare you to be a successful client, we strongly recommend you that carefully work through this fill-in-the-blank warmup. First work on paper, perhaps drawing diagrams, to build up an understanding of how you correctly send/receive data as void*, then put into it code and test it. Our starter repo includes a simple warmup.c that you can use as your test program.

If you run into issues getting this code to work or have lingering doubts about whether you've truly grasped it, do not go onto assign2, instead reach out now (forum, email, office hours) to resolve the confusion. You may freely talk with others about this exercise and your solution to settle your understanding. Do not stop short by accepting/offering shallow fixes (e.g. "you need a & or * here" or "change the size to some big number"). You want to deeply and authentically understand how to appropriately fill the blanks below (especially with regard to level of indirection!) as well as know what happens for the the wrong choices.

Let's say you want to use the CVector to store integers. Consider this code:

int entries[] = {34, 15, 17};
int n = sizeof(entries)/sizeof(entries[0]);

CVector *cv = cvec_create( _________, n, _________ );
for (int i = 0; i < n; i++)     // add values from entries array
    cvec_append(cv, _________ );

// print first elem
printf("First = %d\n", _________ cvec_first(cv)); 
cvec_dispose(cv);

The tricky parts are left blank for you to fill in. There are three critical things to understand about CVector:

  1. How do you properly create a CVector that will store the desired type?
  2. When you are inserting or appending a new value, how do you pass the element as a parameter?
  3. When you are retrieving an element via cvec_nth or manipulating it in a callback function during a search/sort operation, what do you need to cast the void* to? Do you need to dereference it? How many times?

Getting this correct for the basic types such as integers, floats, or structs never seems to be too much trouble for students. But now consider when the type you want to store in the CVector is itself a pointer, for example, considering storing strings:

char *entries[] = {"watermelon", "kiwi", "honeydew"};
int n = sizeof(entries)/sizeof(entries[0]);

CVector *cv = cvec_create( sizeof(char*), n, NULL );      // filled in for you this time
for (int i = 0; i < n; i++)     // add values from entries array
    cvec_append(cv, _________);

// print first elem
printf("First = %d\n", _________ cvec_first(cv)); 
cvec_dispose(cv);

How do you fill the blanks for this one? What changes when the element is pointer instead of integer? The fact that char* is "already a pointer" seems to lead students into faulty conclusions but is there really anything different about storing/retrieving an element of pointer type? Be careful! What is an easy mistake to make in this case that didn't come up in the integer example? What will happen if you make that mistake? Does it compile? Will it run? What effect will you see? What would valgrind say?

The call to cvec_create might seem simple enough that it shouldn't be a sticking point, but every detail of using a void* interface requires vigilance. The first argument to cvec_create is the element size, i.e. the number of bytes required to store a single element. Passing an incorrect size will wrongly configure the CVector from the start and initiate a trail of pernicious downstream effects. We filled in the cvec_create call in the second example to ensure the initialization is correct. When you initialize your own CVectors, be absolutely sure that you yourself are supplying the correct size to cvec_create.

To continue further on the path to becoming wise in the way of void* use, rework the strings example to store heap-allocated strings; this is a more typical use than storing string constants. Make a heap copy of each string (strdup) you intend to store in the CVector. When you append one of these dynamically-allocated strings to the CVector, it is as if you are transferring ownership of that memory to the CVector. Supplying the proper cleanup callback to cvec_create assures that those strings will be properly deallocated when the CVector itself is.

Once you think you have the right understanding, fill in the blanks in the warmup.c program and compile/test/debug. Be sure to run all versions under Valgrind (especially this last one) and verify there are no error nor leaks. Repeat until you have mastery of void*!

You can extend the warmup program to explore additional CMap/CVector functions (e.g. iterator, searching and sorting callbacks, and so on) to further reinforce how to properly communicate through a void* interface. It is essential that you thoroughly understand this critical interaction, so please take the time now to get it completely straight and save yourself much heartache when writing the assignment!

Frequently asked questions

Do we need to submit the warmup?

No, it is just for your practice.  

Where is the answer key?

Please, please, please work through it for yourself. Edit, run, valgrind the code to verify it is correct rather than compare to an answer key. You don't want the answers that fill in these blanks, you want the knowledge of how to determine the right answer to fill in any such blanks!

The warmup uses NULL for the cleanup function and yet doesn't leak. Why does this work? When will I need a non-NULL cleanup?

cvec_dispose frees the memory that the CVector allocated and used to store the element values. That alone is sufficient if the element type is self-contained, e.g. ints or pointer to constant strings. If instead the elements refers to something that requires further cleanup, e.g. an opened FILE*, a dynamically-allocated char*, or pointer to linked list, the client must supply a cleanup function that makes the necessary fclose/free calls. You can see this in action by completing the dynamically-allocated variant of the strings example suggested above.

In which ways are client use of CVector and CMap the same? How are they different?

All the tricky void* parts are the same. CVector supports storing elements of any type through the use of void* just as CMap supports storing values of any type. All CMap values are passed/returned by their location in memory, same as CVector elements are. CMap keys, however, are not generic, they are always C-strings. This means the CMap can handle keys in a type-safe manner and manage their memory, making the client's interaction with keys very straightforward.

What makes working with void* so difficult? What should I do to avoid the pitfalls?

The void* type is dangerously lax -- any kind of pointer is completely acceptable. Consider client code that uses generic operations to add an element, search for an element, retrieve an element, compare two elements, etc. Each of those interactions will involve passing or receiving a void*, which means every one of those is susceptible to supplying the wrong pointer or applying the wrong level of indirection. It is incredibly easy to make these mistakes and you'll get zero help from the compiler and runtime in terms of detecting or reporting the trouble. Stay vigilant! In many cases, a bad use of void* silently does the wrong thing and any observable symptoms may surface far removed from the actual error. Even if every one of your void* uses is correct except one, the one that is wrong can be so damaging that your entire program seems hopelessly broken and cause you to doubt everything you've coded. For example, if you pass the wrong void* when adding an element, the container will store completely bogus data, and there will be nothing you can do right from that point forward. It is fruitless to try to find/fix your use of cmap_get that produces wild results when the actual error was made in a bad use of cmap_put. An even more pernicious error is creating the container for the wrong element size. If the wrong amount of data is being stored/accessed, the subsequent operations are subtly (or not so subtly) compromised. Trying to fix the problem by changing your calls to cvec_append or cvec_nth is futile -- you've got to attend to the actual error. Stay calm and debug systematically -- you can do it!

Be sure you understand the fundamental premise of how a void* container works. When creating a void* container, you specify the size of the data type. That size dictates how many bytes to allocate and copy per-element. The container treats each value as merely a "bag of bytes" of that size. In all operations, access to elements is by address (this is where the void* comes in). Given a value's location in memory and its size, the CVector can copy the correct number of bytes from that location. Assume you have created a CVector to store integer elements. To append a new value, you don't pass the value of the integer itself, instead you tell the CVector where in memory the value is located. The CVector will copy an int-size bag of bytes from that location into its internal storage. If you retrieve the nth element, the CVector does not return the value of the integer, instead it returns location in memory at which it has stored the value. In all exchanges between client/container, access to elements is always and only by address. The fact that every element goes in/out of the container by address does not indicate it is storing those pointers -- instead that address is used to locate the value in memory. The level of indirection is needed to smooth over the difference between values of different size/type.

There are a few guidelines to help keep you on the path to void* glory: