Lab 4: void * and Function Pointers

Lab sessions Tue Oct 13 to Thu Oct 15

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Pre-Lab Exercise

Before attending your lab, please work through the "Pre-lab Exercise" section below. We plan for the exercise to take 20-30min maximum, and it will be essential to the further problems you'll work through during your lab session. You'll discuss your findings and questions at the start of your lab sessions this week. Feel free to post on the discussion forum if you have questions while working!

Learning Goals

During this lab, you will:

  1. explore how C void*/function pointers support generic operations
  2. study the implementation of generic operations and client callback functions
  3. debug void* pitfalls

Get Started

Clone the lab starter code by using the command below.

git clone /afs/ir/class/cs107/repos/lab4/shared lab4

Pre-Lab Exercise: Callbacks

The generic sort/search functions in the C library (qsort, bsearch, lfind ...) are functions that can sort or search any type of data. In order to do that, however, they need the caller to provide a comparison function in order for them to know how to compare two values. All these functions use the same form of standard comparison function:

int comparison_fn_t (const void *, const void *)

Any version of the comparison function that you may implement for your data receives pointers to the two values to compare and returns an integer that indicates their order, using the same +/-/0 return values as strcmp.

The most critical issue to understand is that all generic operations work with data via pointers to values, never the values directly. Referring to data by address is fundamental to how C supports generic functions. Sending or receiving an actual value is not possible because the values vary in type/size. Instead what is exchanged are pointers to values. All pointers, regardless of pointee, are 8-byte addresses that are type-compatible with void*.

Implementing a comparison function follows a similar pattern:

  1. Cast the void* argument and set a pointer of known pointee type equal to it.
  2. Dereference the typed pointer to access the value. (Steps 1 and 2 are often combined to cast and dereference in one expression.)
  3. Compare values to determine the result to return.

The actual comparison logic (Step 3) is usually straightforward; it is mishandling the void* in Steps 1 and 2 that you have to watch out for.

Review the example comparison functions in the callbacks.c file to see how each fits the above pattern. Work through the following questions:

  • These callbacks immediately cast the void* arguments and store into a properly typed local variable. What is the advantage of doing this up front rather than instead casting/dereferencing the argument each time it is used?
  • Supplying a different comparison function to qsort allows you to sort the array using a different criteria. For example, the two callbacks that operate on the custom struct city each sort by a different field. How would we write a comparison function to sort in reverse order? How would we write a comparison function that compares by one field as a primary sort order, breaking ties by comparing a secondary field?
  • The comparison function that orders cities by zip code returns the difference between the values as a quick way to compute a positive, negative, or zero comparison result. Do you see how it works? This is a common shortcut for comparing integer values. (Note: the subtraction overflows if the difference exceeds INT_MAX, so if you need to support such extreme values, stick with the longer if/else with different cases.)
  • All comparison functions fit the same prototype listed above. That means any comparison function can be applied to any type of array - there is no type-matching. What would be the consequence of sorting an int array using a character comparison function? What about sorting an array of cities using a string comparison function?
  • Look carefully to identify the subtle differences between the compare_letters and compare_strings callbacks. Could you use the compare_letters to sort an array of strings by the first letter? Why or why not?
  • What would a callback comparison function that can be passed to qsort look like if we wanted to arrange an array of ints in order of increasing absolute value? (hint - the builtin C abs() function can calculate absolute value).

Lab Exercises

1) Code Study: gfind_max (20 minutes)

gfind_max is a generic function we've written to find the largest array element that according to the client's comparison function:

  void *gfind_max(void *arr, int n, size_t elemsz, 
                  int (*compare_function)(const void *, const void *)) {
1     void *pmax = arr;
2     for (int i = 1; i < n; i++) {
3         void *ith = (char *)arr + i*elemsz;
4         if (compare_function(ith, pmax) > 0) {
5             pmax = ith;
6         }
7     }
8     return pmax;
  }

Look over this code to see how a generic function is implemented. This code is also included in the generic.c file, along with some functions that use it. Here are some questions to talk over with your group:

  • Line 3 shows the idiomatic access to the ith position in a generic array. Be sure you understand the expression's purpose/operation. What is the intention of the typecast to (char *)? What would be the consequence of removing that cast?
  • The website cdecl.org can convert a declaration from "C gibberish into English". This is handy when trying to unravel an inscrutable declaration. Copy the parameter declaration for compare_function above and paste into cdecl to gets its explanation in English.
  • Note that invoking the client callback via a function pointer looks pretty much the same as making an ordinary function call. What happens if you attempt to call the function pointer with the wrong number or wrong type of arguments? Try editing generic.c to remove a parameter from when gfind_max calls compare_function and see what happens.
  • gfind_max returns a void *. What does that pointer represent? Why does the function return a pointer to a value rather than the value itself?
  • How could a client use gfind_max to find the smallest element instead of the largest?

The necessarily permissive nature of a void* interface makes for a treacherous client experience. There are many ways to misuse a generic function, and the compiler often does not warn you about these transgressions. Let's explore this situation further.

The main function in generic.c makes four calls to gfind_max. The first call is completely correct and prints the expected result. Each of the subsequent three calls is incorrect in some way. For each of these below, try to work out what you believe will be printed, and then verify that your understanding is correct by running the program. Drawing memory diagrams and/or tracing in gdb may be very helpful in understanding the behavior.

  • Incorrect call #1 passes a char comparison function to be used on an array of int elements. Yikes! Why is there no compile or runtime error from our mistake? Does running Valgrind report anything helpful? How does gfind_max behave in this case?
  • What is the error in call #2? Where did the reported "max" value even come from? (Note: some bytes may be in a different order than you expect. This is because of something called "endian-ness", and it turns out the myth machines are "little endian". You can read more endianness here, but you don't have to worry about it for CS107. The important thing to know here is just generally why the reported max value is what it is).
  • What is the error in call #3? Why will this call always return a pointer to the last element?

(Note: for the next two problems, some bytes may be in a different order than you expect. This is because of something called "endian-ness", and it turns out the myth machines are "little endian". You can read more endianness here, but you don't have to worry about it for CS107. The important thing to know here is just what the general bytes are that you are examining).

2) bsearch Bug (15 minutes)

bsearch is a C standard library function to search for an element in an array. As a rule, for bsearch to be able to work properly, the array must be sorted according to the same comparison function that the search is using. Take a look at bsearch_bug.c - the programmer who wrote the main function is confounded by why they couldn't get their code to work using the same comparison function. They eventually got it working by resorting to using a different comparison for search than sort. They know this can't be good, but were unable to identify the correct fix.

Your group will investigate!

Before starting, remember the core principle that all generic operations work with data via pointers to values, never the values directly. Keep this in mind as you explore this provided code. Draw memory diagrams to keep track of the pointers and levels of indirection you are working with. This code is an example of a very common bug on assign4, so getting to the bottom of it now will help later on!

  • Compile the program as-is and run it to observe that it does seem to work despite the mismatch in comparison functions. Change the code to use compare_first_characters as the comparison function for both sort and search. Run this version and it crashes. On what operation in the code does it crash? Why?
  • The original author's workaround was to add a different comparison function to be used for search. It is a big red flag that this comparison function typecasts its two void* arguments to different pointee types. The fact that it manages to "work" at all is sketchy and depends on a precise detail of how bsearch is implemented. What detail is that? If you very carefully read the man page for bsearch, you will see that this detail is guaranteed to be true for a conforming implementation (Hint: what does the first parameter of the comparison function represent? How about the second? Check out the second paragraph of the manual page), but that still doesn't make it a good idea to depend on it in this way.
  • Identify the proper fix to the code that makes the program work correctly and use the same comparison function compare_first_characters for both sort and search, as they should. (Hint: take a closer look at the values of the parameters passed into bsearch when calling it).

The point of this exercise is to highlight the necessity of maintaining vigilance as a client of a void* interface. It also foreshadows the futility of trying to get the correct code via trial and error. While randomly permuting * & and typecasts might eventually land on a correct combination, this approach does absolutely nothing for your understanding. Instead, if you take the time to work through the operation on paper, draw diagrams, and trace execution in gdb, you can become confident about what level of indirection is appropriate in what context and why. Ask questions about what you don't understand!

3) GDB Tips (15 minutes)

We'll periodically try to introduce you to new helpful gdb commands or features to aid in your debugging. This week, we introduce the "examine" command, and how to print arrays. For each of these, try setting breakpoints and printing out values in the gdb_practice.c file, where we have declared some variables already. Feel free to edit and play around with this file to help you get familiar with these features.

Examine (x)

The examine command, x (click here for documentation) is a helpful command to examine the contents of memory independent of the type of data at a memory location. It's like print, but for generic memory rather than a specific type of variable. For instance, you can use x to print out a certain number of bytes starting at a given address. If you have a pointer ptr, for instance, you could print out in hex the 8 bytes starting at the address it contains by executing x/8bx ptr. The optional parameters after the slash specify what you would like to print. The first one (e.g. 8 or 2) lets you specify how many you would like to examine, the second (e.g. b or w) specifies whether you would like to print out bytes, words (a word is 4 bytes), etc., and the third (e.g. x) specifies how you would like to print them out (e.g. x for hex, d for decimal). Check out the documentation link for a full summary. Try out the following with gdb_practice when you're ready:

  1. Run gdb on the gdb_practice program. Set a breakpoint on main and step into the function past the variable declaration/initializations, including the array nums.
  2. try x/4bx ptr to print out the 4 bytes beginning at the address in ptr in hex. What do you see? Why is that?
  3. try x/8bx ptr to print out the 8 bytes beginning at the address in ptr in hex. What do you see? Why is that? (hint: where does number2 live vs. number?)
  4. try x/8bx nums to print out the 8 bytes beginning at the start of the array nums in hex. What do you see? Why is that?

Printing Arrays

If you print a stack array from within the function in which it is declared, gdb will show the array and its contents. In that context, gdb has access to both the element type and the count of elements, and uses it to print a nice representation of the entire array. (Try this in the main function by printing out nums!) However, it cannot automatically do the same in other contexts, such as for a heap array or for an array/pointer passed into a function (try this in my_function to see what it does).

However, it is possible to print the entire array in those contexts, but you have to provide more information to gdb. Let's see how!

  1. Run gdb on the gdb_practice program. Set a breakpoint on main and step through the function past the variable declaration/initializations, including the array nums.
  2. Try p nums. Here in the context of its declaration, gdb knows that it is a stack array of a certain size and can show the entire stack array. Great!
  3. Now try p argv. All gdb knows about argv is that it is a pointer. Bummer.
  4. Try p argv[0]@argc and gdb will now print the entire contents of the argv array. Hooray!
  5. The syntax to learn is p ELEM@COUNT where ELEM is the first element to print and COUNT is the count of elements to print. ELEM and COUNT are C expressions and can refer to any variables in the current scope. Try p nums[1]@2 to show a 2-element portion in the middle of the array.
  6. You can also add in a typecast if needed. For example, (as a hypothetical example, not related to the provided code) if you are given a parameter ptr of type void* that you know is the base address of an array of nelems elements of type char*, you could print the entire array as p *(char **)ptr@nelems.
  7. Try using this feature while paused in the my_function function to print out the contents of the passed-in array.

[Optional] Extra Problems

Finished with lab and itching to further exercise your generics and function pointer skills? Check out our extra problems!

Recap

Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaway from lab4 should be getting your bearings in the world of raw memory. The goals are for you to be able to write and use functions passed as parameters (including the cryptic syntax), and know how to use the type system to your advantage wherever you can, but also how to work without it where you must. You should know how to make a proper call to memcpy/memmove, exactly where and why you need a typecast, and have increased vigilance about using the correct level of indirection. Here are some questions to verify your understanding and get you thinking further about these concepts:

  • Why must you typecast a void* in a pointer-arithmetic expression?
  • An asymmetric comparison function is one that casts its two void* arguments to different pointee types. Why is passing an asymmetric comparison function to bsearch almost certainly an indication of programmer error?
  • C search functions commonly return a pointer to the found element (if any), not the found element itself. Why is this?

screaming into the void