Lab sessions Tue Apr 27 to Thu Apr 29
Lab written by Julie Zelenski, with modifications by Nick Troccoli
Learning Goals
During this lab, you will:
- explore how C
void*
/function pointers support generic operations - study the implementation of generic operations and client callback functions
- debug
void*
pitfalls
Get Started
Clone the lab starter code by using the command below.
git clone /afs/ir/class/cs107/repos/lab4/shared lab4
Review - Comparison Functions
Comparison functions are a common use case of generics + function pointers. For example, the generic sort/search functions in the C library (qsort
, bsearch
, lfind
...) are functions that can sort or search any type of data. In order to do that, they need the caller to provide a comparison function that specifies how to order two of their values. All these functions use the same form of standard comparison function:
int comparison_fn_t (const void *, const void *)
Any version of the comparison function that you may implement for your data receives pointers to the two values to compare and returns an integer that indicates their order, using the same +/-/0 return values as strcmp
.
The most critical issue to understand is that all generic operations work with data via pointers to values, never the values directly. Referring to data by address is fundamental to how C supports generic functions. Sending or receiving an actual value is not possible because the values vary in type/size. Instead what is exchanged are pointers to values. All pointers, regardless of pointee, are 8-byte addresses that are type-compatible with void*
.
Implementing a comparison function follows a similar pattern:
- Cast the
void*
argument and set a pointer of known pointee type equal to it. - Dereference the typed pointer to access the value. (Steps 1 and 2 are often combined to cast and dereference in one expression.)
- Compare values to determine the result to return.
The actual comparison logic (Step 3) is usually straightforward; it is mishandling the void*
in Steps 1 and 2 that you have to watch out for. You will get lots of practice with that in this lab!
Lab Exercises
(Note: for these exercises, some bytes may be in a different order than you expect. This is because of something called "endian-ness", and it turns out the myth machines are "little endian". You can read more endianness here, but you don't have to worry about it for CS107. The important thing to know for these exercises is just what the general bytes are that you are examining).
1) void *
and Pitfalls (15 minutes + 10min discussion)
Learning Goals: Better understand the usage of comparison functions and the levels of indirection with generic comparison functions. See the various ways void *
and comparison functions are prone to error. You will be writing custom comparison functions and writing code that uses comparison functions on assign4 - ask questions about anything you don't understand!
Starter code file: generic.c
The provided generic.c
file shows examples of using qsort
to sort various types of data. First, turn your attention to sort_cities
. Notice how supplying a different comparison function to qsort
allows you to specify a different element ordering. For example, the two callbacks that operate on the list of city names each sort by a different criteria. Review the example comparison functions to see how each fits the standard comparison function pattern.
compare_strings_by_descending_length
returns the difference between the values as a quick way to compute a positive, negative, or zero comparison result. This is a common shortcut for comparing integer values. (Note: the subtraction overflows if the difference exceeds INT_MAX, so if you need to support such extreme values, stick with the longer if/else with different cases.)
Q1: Look carefully to identify the subtle differences between the compare_strings_alphabetically
and compare_letters_alphabetically
callbacks. Could you use compare_letters_alphabetically
to sort an array of strings by the first letter? Why or why not?
The necessarily permissive nature of a void*
interface makes for a treacherous client experience. There are many ways to misuse a generic function, and the compiler often does not warn you about these transgressions. Let's explore this situation further in the sort_integers
function. There, the first call to qsort
is completely correct and prints the expected result. The second call, however, is incorrect; it passes a char
comparison function to be used on an array of int
elements. Yikes!
Q2: Why is there no compile or runtime error from our mistake? How does qsort
behave in this case? Drawing memory diagrams and/or tracing in gdb may be very helpful in understanding the behavior.
2) Debugging: bsearch
Bug (15 minutes + 10min discussion)
Learning Goals: Better understand the usage of pointers and levels of indirection with generics functions like bsearch
. You will use bsearch
on assign4
, and the bug you investigate below is an extremely common assign4 bug, so getting to the bottom of it now will save you time later on!
Starter code file: bsearch_bug.c
bsearch
is a C standard library function to search for an element in a sorted array using binary search. As a rule, for bsearch
to be able to work properly, the array must be sorted according to the same comparison function that the search is using. Take a look at bsearch_bug.c
- the programmer who wrote the main
function is confounded by why they couldn't get their code to work using the same comparison function. They eventually got it working by resorting to using a different comparison for search than sort. They know this can't be good, but were unable to identify the correct fix.
Your group will investigate!
Before starting: remember the core principle that all generic operations work with data via pointers to values, never the values directly. Keep this in mind as you explore this provided code. Draw memory diagrams to keep track of the pointers and levels of indirection you are working with.
Compile the program as-is and run it to observe that it does seem to work despite the mismatch in comparison functions. Change the code to use compare_first_characters
as the comparison function for both sort and search. Run this version and it crashes.
Q1: On what operation in the code does it crash? Why? (Hint: try printing out parts of the expression it crashes on in GDB and see what errors out)
The original author's workaround was to add a different comparison function to be used for search. It is a big red flag that this comparison function typecasts its two void*
arguments to different pointee types. The fact that it manages to "work" at all is sketchy and depends on a precise detail of how bsearch
is implemented. If you very carefully read the man page for bsearch
, you will see that this detail is guaranteed to be true for a conforming implementation, but that still doesn't make it a good idea to depend on it in this way.
Q2: What bsearch
detail does this code depend on to work? (Hint: what does the first parameter of the comparison function represent? How about the second? Check out the second paragraph of the manual page).
Identify the proper fix to the code that makes the program work correctly and use the same comparison function compare_first_characters
for both sort and search, as they should.
Q3: How can we fix this bug? (Hint: take a closer look at the values of the parameters passed into bsearch
when calling it).
The point of this exercise is to highlight the necessity of maintaining vigilance as a client of a void*
interface. It also foreshadows the futility of trying to get the correct code via trial and error. While randomly permuting *
&
and typecasts might eventually land on a correct combination, this approach does absolutely nothing for your understanding. Instead, if you take the time to work through the operation on paper, draw diagrams, and trace execution in gdb, you can become confident about what level of indirection is appropriate in what context and why. Ask questions about what you don't understand!
[Optional] Extra Problems
Finished with lab and itching to further exercise your generics and function pointer skills? Check out our extra problems!
Recap
Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaway from lab4 should be getting your bearings in the world of raw memory. The goals are for you to be able to write and use functions passed as parameters (including the cryptic syntax), and know how to use the type system to your advantage wherever you can, but also how to work without it where you must. You should know how to make a proper call to memcpy
/memmove
, exactly where and why you need a typecast, and have increased vigilance about using the correct level of indirection. Here are some questions to verify your understanding and get you thinking further about these concepts:
- Why must you typecast a
void*
in a pointer-arithmetic expression? - An asymmetric comparison function is one that casts its two
void*
arguments to different pointee types. Why is passing an asymmetric comparison function tobsearch
almost certainly an indication of programmer error? - C search functions commonly return a pointer to the found element (if any), not the found element itself. Why is this?