Lab sessions Tue May 01 to Fri May 04

Lab written by Julie Zelenski

Find another world where freedom waits, past the the stars in fields of ancient void.
—Black Sabbath, Into the Void

Learning goals

During this lab, you will:

explore how C void*/function pointers support generic operations
study the implementation of generic operations and client callback functions
debug void* pitfalls

First things first! Find an open computer to share with a partner. Introduce yourself and share your favorite unix pipeline trick.

Get started

Clone the repo by using the command below to create a lab4 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab4/shared lab4

Open the lab checkoff form.

Lab exercises

1) Code study: callbacks (20 minutes)

The generic sort/search functions in the C library (qsort, bsearch, lfind ...) all use the same form of standard comparison function:

int comparison_fn_t (const void *, const void *)

A comparison function receives pointers to the two values to compare and returns an integer that indicates their order.

The most critical issue to understand is that all generic operations work with data via pointers to values, never the values directly. Referring to data by address is fundamental to how C supports generic functions. Sending or receiving an actual value is not workable because the values vary in type/size. Instead what is exchanged are pointers to values. All pointers, regardless of pointee, are 8-byte addresses that are type-compatible with void*.

A comparison function follows a similar pattern:

Cast the void* argument and assign to a pointer of known pointee type.
Dereference the typed pointer to access the value. (Steps 1 and 2 are often combined to cast and dereference in one expression.)
Compare values to determine the result to return.

The actual comparison logic (Step 3) is usually straightforward; it is mishandling the void* in Steps 1 and 2 that you have to watch out for.

Review the example comparison functions in the callbacks.c file to see how each fits the above pattern. Work through the following issues with your partner:

These callbacks immediately cast the void* arguments and store into a properly typed local variable. What is the advantage of doing this up front rather than instead cast/dereference the argument each time it is used?
Supplying a different comparison function to qsort allows you to sort the array using a different criteria. For example, the two callbacks that operate on value of struct city type each sort by a different field. How can a comparison function sort in reverse order? How can a comparison function compare by one field as a primary sort order, breaking ties by comparing a secondary field?
The comparison function that orders cities by zip code returns the difference the values as a quick way to compute a positive, negative, or zero comparison result. Do you see how it works? This is a common shortcut for comparing integer values. (Note: the subtraction overflows if the difference exceeds INT_MAX, so if you need to support such extreme values, stick with the longer if/else that breaks into different cases.)
All comparison functions fit the same prototype. That means any comparison function can be applied to any type of array, there is no type-matching. What would be the consequence of sorting an array of int using a character comparison function? What about sorting an array of struct cities using a string comparison function?
Look carefully to identify the subtle differences between the letter_cmpfn and string_cmpfn callbacks. Could you use the letter_cmpfn to sort an array of strings by the first letter? Why or why not?
The GNU libc documentation gives an example compare_doubles comparison function. Its logic is a bit inscrutable at first, so let's figure out what it is up to. What is the result of evaluating an inequality such as 5 > 2? (Try it in gdb to confirm!) How does the subtraction of the two inequalities compute a result of the correct sign in this case?

2) Code study: gfind_max (45 minutes)

gfind_max is a generic function to find the array element that is the largest according to the client's comparison function:

  void *gfind_max(void *arr, int n, size_t elemsz, 
                  int (*compar)(const void *, const void *))
  {
1     void *pmax = arr;
2     for (int i = 1; i < n; i++) {
3         void *ith = (char *)arr + i*elemsz;
4         if (compar(ith, pmax) > 0)
5             pmax = ith;
6     }
7     return pmax;
  }

Look over this code to see how a generic function is implemented. Here are some questions to talk over with your partner:

Line 3 shows the idiomatic access to the ith position in a generic array. Be sure you understand the expression's purpose/operation. What is the intention of the typecast to (char *)? What would be the consequence of removing that cast?
The website cdecl.org can convert a declaration from "C gibberish into English". This is handy when trying to unravel an inscrutable declaration. Copy the parameter declaration for compar above and paste into decl to gets its explanation in English.
Note that invoking the client callback via a function pointer looks pretty much the same as making an ordinary function call. What happens if you attempt to call the function pointer passing the wrong number or wrong type of arguments?
gfind_max returns a void *. What does that pointer represent? Why does the function return a pointer to a value rather than the value itself?
How could a client use gfind_max to find the smallest element instead of the largest?

The necessarily permissive nature of a void* interface makes for a treacherous client experience. There are many ways to misuse a generic function, and the compiler often does not warn you about these transgressions. Let's explore this situation further.

The test_max function makes four calls to gfind_max. The first call is completely correct and prints the expected result. Each of the subsequent three calls is incorrect in some way. First, try to work out what you believe will be printed and then verify that your understanding is correct by running the program. Drawing memory diagrams and/or tracing in gdb may be very helpful in understanding the behavior.

Incorrect call #1 passes a char comparison function to be used on an array of int elements. Yikes! Why is there no compile or runtime error from our mistake? Does running Valgrind report anything helpful? How does find_max behave in this case?
What is the error in call #2? Where did the reported "max" value even come from?
What is the error in call #3? Why will this call always return a pointer to the last element?

Now examine the function test_bsearch also in generic.c. As a rule, for bsearch to be able to work properly, the array must be sorted according to the same comparison function that the search is using. The programmer who wrote this function is confounded by why they couldn't get their code to work using the same comparison function. They eventually got it working by resorting to using a different comparison for search than sort. They know this can't be good, but were unable to identify the correct fix.

You and your partner will investigate!

Compile the program as-is and run it to observe that it does seem to work despite the mismatch in comparison functions. Change the code to use cmp_first_char as the comparison function for both sort and search. Run this version and it crashes. On what operation does in the code does it crash?
The original author's workaround was to add a different comparison function to be used for search. It is a big red flag that this comparison function typecasts its two void* arguments to different pointee types. The fact that it manages to "work" at all is sketchy and depends on a precise detail of how bsearch is implemented. What detail is that? If you very carefully read the man page for bsearch, you will see that this detail is guaranteed to be true for a conforming implementation, but that still doesn't make it a good idea to depend on it in this way.
Identify the proper fix to the code that makes the program work correctly and sort and search both use the same comparison function cmp_first_char, as they should.

The point of this exercise is to highlight the necessity of maintaining vigilance as a client of a void* interface. It also foreshadows the futility of trying to get the correct code via trial and error. While randomly permuting * & and typecasts might eventually land on a correct combination, this approach does absolutely nothing for your understanding. Instead if you take the time to work through the operation on paper, draw diagrams, and trace execution in gdb, you can become confident about what level of indirection is appropriate in what context and why. Ask questions about what you don't understand!

3) Code study: memmove (30 minutes)

The C library provides a handful of raw memory routines (e.g. memcpy, memset, ...) that operate on data of unspecified type. Let's take a look inside memmove (version below from musl) to better understand how these kind of functions are implemented.

 1   void *musl_memmove(void *dest, const void *src, size_t n)
 2   {
 3       char *d = dest;
 4       const char *s = src;
 5
 6       if (d==s) return d;
 7       if (s+n <= d || d+n <= s) return memcpy(d, s, n);
 8
 9       if (d<s) {
10          for (; n; n--) 
11               *d++ = *s++;
12       } else {
13           while (n) {
14               n--; 
15               d[n] = s[n];
16          }
17       }
18       return dest;
19   }

Go over the code with your partner and discuss these questions:

The function's interface declares its parameters as void* pointers, but internally it manipulates these pointers as char*. Why the inconsistency? What would be the consequence of trying to reconcile the discrepancy by declaring the interface as char* or changing the implementation to use void*?
Note that there is no typecast on lines 3 and 4 when assigning from an untyped pointer to a typed pointer. A void* is the universal donor/recipient and can be freely exchanged with other pointer types, no cast necessary. I find the author's decision to not cast pleasing, given my preference for casting only where you absolutely must. The internets argue about this endlessly if you want to hear more about why it is contentious.
What special case is being handled on line 6?
Review the man pages for memcpy and memmove to understand the differences between these two functions. What special case is being handled on line 7?
What two cases are being divided by the if/else on Lines 9/12? Why are both cases necessary?
Despite lines 10-11 and lines 13-15 both accomplishing a similar function (i.e. copy bytes from source to destination), they are oddly dissimilar in construction: for instead of while, pointer arithmetic/dereference versus array indexing. Which version do you find easier to follow? Which one is easier for you to verify correctness? My preference would be to have picked one style and used for both. Can you come up with a theory to justify the change in style? (I myself have no clue!)
Trace the call musl_memmove(NULL, "cs107", 0). Will it result in a segmentation fault from trying to read/write an invalid pointer? Why or why not? What about the call musl_memmove(NULL, "cs107", -1)? Verify your understanding by running the code.c program.
The man page for memmove states that the operation takes place as though it copies the data twice (src->temp, temp->dst), which implies that call to memmove might take twice as long as memcpy. However, the musl implementation doesn't operate in this literal manner. It does correctly handle overlap,but not by copying twice. What does it do instead? In this implementation, what then is the expected added cost of using memmove over memcpy?

The implementation of memmove should remind you of the strncpy function we saw in lecture. The memxxx functions have much in common with their strxxx equivalents, just without the special case to stop at a null byte. In fact, the memxxx functions are declared as part of the <string.h> module and quite possibly written by the same author.

4) Protip: gdb print array (10 minutes)

A gdb feature we want to get into your repertoire is how to print arrays. If you print a stack array from within the function it is declared, gdb will show the array and its contents. In that context, gdb has access to both the element type and the count of elements, and uses it to print a nice representation of the entire array. However it cannot automatically do the same in other contexts, such as for a heap array or for an array/pointer passed into a function. It is possible to print the entire array in those contexts, but you have to provide more information to gdb. Let's see how!

Run gdb on the program callbacks. Set a breakpoint on main and step into the function past the variable declaration/initializations.
Try p scores. Here in the context of its declaration, gdb knows that it is a stack array of a certain size and can show the entire stack array. Great!
Now try p argv. All gdb knows about argv is that it is a pointer. Bummer.
Try p argv[0]@argc and gdb will now print the entire contents of the argv array. Hooray!
The syntax to learn is p ELEM@COUNT where ELEM is the 0th element and COUNT is the count of elements to print. ELEM and COUNT are C expressions and can refer to any variables in current scope. Try p scores[1]@2 to show a 2-element portion in the middle of the array.
You can also add in a typecast if needed. For example, given a parameter ptr of type void* that you know is the base address of an array of nelems elements of type char*, you could print the entire array as p *(char **)ptr@nelems.

Check off with TA

Before you leave, complete your checkoff form and ask your lab TA to approve it so you are properly credited. If you don't complete all the exercises during the lab period, we encourage you to followup and finish the remainder on your own. Try our self-check to reflect on what you've done and how it's going.

screaming into the void