Lab 4: void * and Function Pointers

Lab sessions Tue May 05 to Thu May 07

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Pre-Lab Exercises

For this week's pre-lab, we ask that you take 15-20 minutes to individually fill out the CS107 mid-quarter evaluation (MQE). This evaluation is important in letting the instructor and TAs know what they're doing well and what they can improve. The evaluation is anonymous, and we appreciate any feedback that you have. Submitting the evaluation is required to get lab credit for the week.

You can find the MQE here. Please fill it out by the end of the day (11:59PM PST) on Sat. 5/9. Thank you in advance for any feedback you have!

Learning Goals

During this lab, you will:

explore how C void*/function pointers support generic operations
study the implementation of generic operations and client callback functions
debug void* pitfalls

Get Started

Introduce yourself to your group and tell them your best trick for using the terminal.

Clone the lab starter code by using the command below.

git clone /afs/ir/class/cs107/repos/lab4/shared lab4

Next, pull up the online lab checkoff and have it open in a browser so you can jot things down as you go.

Exercises

1) Code Study: Callbacks (15 minutes)

The generic sort/search functions in the C library (qsort, bsearch, lfind ...) are functions that can sort or search any type of data. In order to do that, however, they need the caller to provide a comparison function in order for them to know how to compare two values. All these functions use the same form of standard comparison function:

int comparison_fn_t (const void *, const void *)

Any version of the comparison function that you may implement for your data receives pointers to the two values to compare and returns an integer that indicates their order, using the same +/-/0 return values as strcmp.

The most critical issue to understand is that all generic operations work with data via pointers to values, never the values directly. Referring to data by address is fundamental to how C supports generic functions. Sending or receiving an actual value is not possible because the values vary in type/size. Instead what is exchanged are pointers to values. All pointers, regardless of pointee, are 8-byte addresses that are type-compatible with void*.

Implementing a comparison function follows a similar pattern:

Cast the void* argument and set a pointer of known pointee type equal to it.
Dereference the typed pointer to access the value. (Steps 1 and 2 are often combined to cast and dereference in one expression.)
Compare values to determine the result to return.

The actual comparison logic (Step 3) is usually straightforward; it is mishandling the void* in Steps 1 and 2 that you have to watch out for.

Review the example comparison functions in the callbacks.c file to see how each fits the above pattern. Work through the following questions with your partner:

These callbacks immediately cast the void* arguments and store into a properly typed local variable. What is the advantage of doing this up front rather than instead casting/dereferencing the argument each time it is used?
Supplying a different comparison function to qsort allows you to sort the array using a different criteria. For example, the two callbacks that operate on the custom struct city each sort by a different field. How would we write a comparison function to sort in reverse order? How would we write a comparison function that compares by one field as a primary sort order, breaking ties by comparing a secondary field?
The comparison function that orders cities by zip code returns the difference between the values as a quick way to compute a positive, negative, or zero comparison result. Do you see how it works? This is a common shortcut for comparing integer values. (Note: the subtraction overflows if the difference exceeds INT_MAX, so if you need to support such extreme values, stick with the longer if/else with different cases.)
All comparison functions fit the same prototype listed above. That means any comparison function can be applied to any type of array - there is no type-matching. What would be the consequence of sorting an int array using a character comparison function? What about sorting an array of cities using a string comparison function?
Look carefully to identify the subtle differences between the compare_letters and compare_strings callbacks. Could you use the compare_letters to sort an array of strings by the first letter? Why or why not?
The GNU libc documentation gives an example compare_doubles comparison function. Its logic is a bit inscrutable at first, so let's figure out what it is up to. What is the result of evaluating an inequality such as 5 > 2? (Try it in gdb to confirm!) How does the subtraction of the two inequalities compute a result of the correct sign in this case?

2) Code Study: `gfind_max` (35 minutes)

gfind_max is a generic function we've written to find the largest array element that according to the client's comparison function:

  void *gfind_max(void *arr, int n, size_t elemsz, 
                  int (*compare_function)(const void *, const void *)) {
1     void *pmax = arr;
2     for (int i = 1; i < n; i++) {
3         void *ith = (char *)arr + i*elemsz;
4         if (compare_function(ith, pmax) > 0) {
5             pmax = ith;
6         }
7     }
8     return pmax;
  }

Look over this code to see how a generic function is implemented. This code is also included in the generic.c file, along with some functions that use it. Here are some questions to talk over with your partner:

Line 3 shows the idiomatic access to the ith position in a generic array. Be sure you understand the expression's purpose/operation. What is the intention of the typecast to (char *)? What would be the consequence of removing that cast?
The website cdecl.org can convert a declaration from "C gibberish into English". This is handy when trying to unravel an inscrutable declaration. Copy the parameter declaration for compare_function above and paste into cdecl to gets its explanation in English.
Note that invoking the client callback via a function pointer looks pretty much the same as making an ordinary function call. What happens if you attempt to call the function pointer with the wrong number or wrong type of arguments? Try editing generic.c to remove a parameter from when gfind_max calls compare_function and see what happens.
gfind_max returns a void *. What does that pointer represent? Why does the function return a pointer to a value rather than the value itself?
How could a client use gfind_max to find the smallest element instead of the largest?

The necessarily permissive nature of a void* interface makes for a treacherous client experience. There are many ways to misuse a generic function, and the compiler often does not warn you about these transgressions. Let's explore this situation further.

The test_max function in generic.c makes four calls to gfind_max. The first call is completely correct and prints the expected result. Each of the subsequent three calls is incorrect in some way. For each of these below, try to work out what you believe will be printed, and then verify that your understanding is correct by running the program. Drawing memory diagrams and/or tracing in gdb may be very helpful in understanding the behavior.

Incorrect call #1 passes a char comparison function to be used on an array of int elements. Yikes! Why is there no compile or runtime error from our mistake? Does running Valgrind report anything helpful? How does gfind_max behave in this case?
What is the error in call #2? Where did the reported "max" value even come from? (Note: some bytes may be in a different order than you expect. This is because of something called "endian-ness", and it turns out the myth machines are "little endian". You can read more endianness here, but you don't have to worry about it for CS107. The important thing to know here is just generally why the reported max value is what it is).
What is the error in call #3? Why will this call always return a pointer to the last element?

Now examine the function test_bsearch, also in generic.c. bsearch is a C standard library function to search for an element in an array. As a rule, for bsearch to be able to work properly, the array must be sorted according to the same comparison function that the search is using. The programmer who wrote this function is confounded by why they couldn't get their code to work using the same comparison function. They eventually got it working by resorting to using a different comparison for search than sort. They know this can't be good, but were unable to identify the correct fix.

You and your partner will investigate!

Before starting, remember the core principle that all generic operations work with data via pointers to values, never the values directly. Keep this in mind as you explore this provided code. Draw memory diagrams to keep track of the pointers and levels of indirection you are working with. This code is an example of a common bug on assign4, so getting to the bottom of it now will help later on!

Compile the program as-is and run it to observe that it does seem to work despite the mismatch in comparison functions. Change the code to use compare_first_characters as the comparison function for both sort and search. Run this version and it crashes. On what operation in the code does it crash? Why?
The original author's workaround was to add a different comparison function to be used for search. It is a big red flag that this comparison function typecasts its two void* arguments to different pointee types. The fact that it manages to "work" at all is sketchy and depends on a precise detail of how bsearch is implemented. What detail is that? If you very carefully read the man page for bsearch, you will see that this detail is guaranteed to be true for a conforming implementation (check out the second paragraph of the manual page), but that still doesn't make it a good idea to depend on it in this way. (Hint: what does the first parameter of the comparison function represent? How about the second?)
Identify the proper fix to the code that makes the program work correctly and use the same comparison function compare_first_characters for both sort and search, as they should. (Hint: take a closer look at the values of the parameters passed into bsearch when calling it).

The point of this exercise is to highlight the necessity of maintaining vigilance as a client of a void* interface. It also foreshadows the futility of trying to get the correct code via trial and error. While randomly permuting * & and typecasts might eventually land on a correct combination, this approach does absolutely nothing for your understanding. Instead, if you take the time to work through the operation on paper, draw diagrams, and trace execution in gdb, you can become confident about what level of indirection is appropriate in what context and why. Ask questions about what you don't understand!

3) Code Study: `memmove` (25 minutes)

The C library provides a handful of raw memory routines (e.g. memcpy, memset, ...) that operate on data of an unspecified type. Let's take a look inside memmove (version below based on code from musl) to better understand how these kind of functions are implemented. This code is included in the memmove.c file as well for you to experiment with.

 1    void *musl_memmove(void *dest, const void *src, size_t nbytes) {
 2        char *dest_copy = dest;
 3        const char *src_copy = src;
 4
 5        if (dest_copy == src_copy) {
 6            return dest_copy;
 7        }
 8
 9        if (src_copy + nbytes <= dest_copy || dest_copy + nbytes <= src_copy) {
10            return memcpy(dest_copy, src_copy, nbytes);
11        }
12
13        if (dest_copy < src_copy) {
14            for (int i = 0; i < nbytes; i++) {
15                dest_copy[i] = src_copy[i];
16            }
17        } else {
18            for (int i = nbytes - 1; i >= 0; i--) {
19                dest_copy[i] = src_copy[i];
20            }
21        }
22
23        return dest;
24    }

Go over the code with your partner and discuss these questions:

The function's interface declares its parameters as void* pointers, but internally it manipulates these pointers as char*. Why the inconsistency? What would be the consequence of trying to reconcile the discrepancy by declaring the interface as char* or changing the implementation to use void*?
Note that there is no typecast on lines 2 and 3 when assigning from an untyped pointer to a typed pointer. A void* is the universal donor/recipient and can be freely exchanged with other pointer types, no cast necessary. That being said, there is a forever-ongoing online discussion about the contentious issue of whether or not to cast here; the perspective employed here is to not cast, as it is not required.
What special case is being handled on line 5?
Review the man pages for memcpy and memmove to understand the differences between these two functions. What special case is being handled on line 9?
What two cases are being divided by the if/else on Lines 13/17? Why are both cases necessary?
The man page for memmove states that the operation takes place as though it copies the data twice (src->temp, temp->dst), which implies that call to memmove might take twice as long as memcpy. However, the musl implementation doesn't operate in this literal manner. It does correctly handle overlap, but not by copying twice. What does it do instead? Take a look at lines 14-16 and lines 18-20 in particular - what is going on in each of these loops? In this implementation, what then is the expected added cost of using memmove over memcpy?
Trace the call musl_memmove(NULL, "cs107", 0). Will it result in a segmentation fault from trying to read/write an invalid pointer? Why or why not? What about the call musl_memmove(NULL, "cs107", -1)? Verify your understanding by running the memmove.c program.
Why not always use memmove? The man page seems to imply that some implementations (though not musl) do suffer a performance hit when using memmove as opposed to memcpy. Moreover, appropriately using memmove and memcpy can communicate to code readers when data may or may not overlap. For these reasons, we default to memcpy, and use memmove only when necessary.

The implementation of memmove may remind you of the strncpy function we saw in lecture. The memxxx functions have much in common with their strxxx equivalents, just without the special case to stop at a null byte. In fact, the memxxx functions are declared as part of the <string.h> module and quite possibly written by the same author.

4) GDB Tips (15 minutes)

We'll periodically try to introduce you to new helpful gdb commands or features to aid in your debugging. This week, we introduce the "examine" command, and how to print arrays. For each of these, try setting breakpoints and printing out values in the gdb_practice.c file, where we have declared some variables already. Feel free to edit and play around with this file to help you get familiar with these features.

Examine (x)

The examine command, x (click here for documentation) is a helpful command to examine the contents of memory independent of the type of data at a memory location. It's like print, but for generic memory rather than a specific type of variable. For instance, you can use x to print out a certain number of bytes starting at a given address. If you have a pointer ptr, for instance, you could print out the 8 bytes starting at the address it contains in hex by executing x/8bx ptr. The optional parameters after the slash specify what you would like to print. The first one (e.g. 8 or 2) lets you specify how many you would like to examine, the second (e.g. b or w) specifies whether you would like to print out bytes, words (a word is 4 bytes), etc., and the third (e.g. x) specifies how you would like to print them out (e.g. x for hex, d for decimal). Check out the documentation link for a full summary. Try out the following with gdb_practice when you're ready:

Run gdb on the gdb_practice program. Set a breakpoint on main and step into the function past the variable declaration/initializations, including the array nums.
try x/4bx ptr to print out the 4 bytes beginning at the address in ptr in hex. What do you see? Why is that?
try x/8bx ptr to print out the 8 bytes beginning at the address in ptr in hex. What do you see? Why is that? (hint: where does number2 live vs. number?)
try x/8bx nums to print out the 8 bytes beginning at the start of the array nums in hex. What do you see? Why is that?

(Note: some bytes may be in a different order than you expect. This is because of something called "endian-ness", and it turns out the myth machines are "little endian". You can read more endianness here, but you don't have to worry about it for CS107. The important thing to know here is just what the general bytes are that you are examining).

Printing Arrays

If you print a stack array from within the function in which it is declared, gdb will show the array and its contents. In that context, gdb has access to both the element type and the count of elements, and uses it to print a nice representation of the entire array. (Try this in the main function by printing out nums!) However, it cannot automatically do the same in other contexts, such as for a heap array or for an array/pointer passed into a function (try this in my_function to see what it does).

However, it is possible to print the entire array in those contexts, but you have to provide more information to gdb. Let's see how!

Run gdb on the gdb_practice program. Set a breakpoint on main and step into the function past the variable declaration/initializations, including the array nums.
Try p nums. Here in the context of its declaration, gdb knows that it is a stack array of a certain size and can show the entire stack array. Great!
Now try p argv. All gdb knows about argv is that it is a pointer. Bummer.
Try p argv[0]@argc and gdb will now print the entire contents of the argv array. Hooray!
The syntax to learn is p ELEM@COUNT where ELEM is the first element to print and COUNT is the count of elements to print. ELEM and COUNT are C expressions and can refer to any variables in the current scope. Try p nums[1]@2 to show a 2-element portion in the middle of the array.
You can also add in a typecast if needed. For example, given a parameter ptr of type void* that you know is the base address of an array of nelems elements of type char*, you could print the entire array as p *(char **)ptr@nelems.
Try using this feature while paused in the my_function function to print out the contents of the passed-in array.

Check off with TA

At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission so you are properly credited for your work. It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaway from lab4 should be getting your bearings in the world of raw memory. The goals are for you to be able to write and use functions passed as parameters (including the cryptic syntax), and know how to use the type system to your advantage wherever you can, but also how to work without it where you must. You should know how to make a proper call to memcpy/memmove, exactly where and why you need a typecast, and have increased vigilance about using the correct level of indirection. Here are some questions to verify your understanding and get you thinking further about these concepts:

Why must you typecast a void* in a pointer-arithmetic expression?
What is the behavior of memcpy when the source and destination overlap? How does memmove properly account for overlapping regions?
An asymmetric comparison function is one that casts its two void* arguments to different pointee types. Why is passing an asymmetric comparison function to bsearch almost certainly an indication of programmer error?
C search functions commonly return a pointer to the found element (if any), not the found element itself. Why is this?

screaming into the void