Lab sessions Tue Apr 24 to Fri Apr 27
Lab written by Julie Zelenski
Learning goals
This goals for this lab are to:
- investigate how arrays and pointers work in C
- get further practice with gdb and valgrind
Find an open computer and somebody new to sit with. Introduce yourself and share your thoughts on the most productive places on campus to get work done.
Get started
Clone the repo by using the command below to create a lab3 directory containing the project files.
git clone /afs/ir/class/cs107/repos/lab3/shared lab3
Open the lab checkoff form.
Lab exercises
1) Gdb: pointers and arrays (25 minutes)
The file code.c
contains a nonsense C program for you to use to observe the mechanics of arrays and pointers.
- Build the program, start it under gdb, and set a breakpoint at
main
. When you hit the breakpoint, useinfo locals
to see the state of the uninitialized stack variables. Step through the initialization statements and useinfo locals
again. Remember that when gdb reports that execution is at line N, this is before Line N has executed. -
The expressions below all refer to the local variable
arr
. First try to figure out what the result of the expression should be, then evaluate in gdb to confirm that your understanding is correct.(gdb) p *arr (gdb) p arr[1] (gdb) p &arr[1] (gdb) p *(arr + 2) (gdb) p &arr[3] - &arr[1] (gdb) p sizeof(arr) (gdb) p arr = arr + 1
-
The main function initializes
ptr
toarr
. The name of a stack array and a pointer to that array are almost interchangeable, but not entirely. Try repeating the above gdb expressions withptr
substituted forarr
. The first five evaluate identically, but the last two produce different results forptr
thanarr
. What is reported for the size of an array? the size of a pointer? The last one is the trickiest to understand. Why is it allowable to assign toptr
but notarr
? - Use the gdb
step
command to advance into the call to thebinky(arr, ptr)
. Once insidebinky
, useinfo args
to see values of the two parameters. Print any expression you can think of ona
andb
and they will evaluate to the same result. This includes the last two expressions from above:sizeof
reports the same size fora
andb
and assignment is permissible for either. What happens in parameter passing to make this so? Try drawing a picture of the state of memory to shed light on the matter. - Set a breakpoint on
change_char
and continue until this breakpoint is hit. When stopped in gdb, useinfo args
. The arguments shown are from the frame forchange_char
. The default frame of reference is the currently executing function. - You can select a different frame of reference with the gdb
frame
command. Usebacktrace
to show the sequence of function calls that led to where the code is currently executing. Frames are numbered starting from 0 for the innermost frame. Try the commandframe 1
to select the frame outsidechange_char
and and then useinfo locals
to see the state fromwinky
. The gdb commandup
is shorthand for selecting the frame that is one higher from current. - Step through
change_char
and examine the state before and after each line. Useinfo args
to show inner frame andup
andinfo locals
to show what's happening in outer frame. Careful observe the effect of each assignment statement. - Step through the call to
change_ptr
and make the same observations. Which of the assignment statements had a persistent effect inwinky
and which did not? Can you explain why?
If you don't understand or can't explain the results you observe, stop here and sort it out with your labmates and lab leader. Having a solid model of what is going on under the hood is an important step toward understanding the commonalities and subtle differences between arrays and pointers.
2) Code study: heap use (30 minutes)
Next up are the functions subtract
and join
. Neither of these is a standard function, just some code we cooked up as sample use of dynamic memory and pointer/array manipulation. The code for these functions is provided in heap.c
so you can run the program or trace in gdb.
The function subtract(one, two)
returns a string which is a copy of one
with the first occurrence of two
removed. The returned string is heap-allocated and becomes the client's responsibility to free when done.
1 char *subtract(const char *outer, const char *inner)
2 {
3 char *found = strstr(outer, inner);
4
5 if (!found) return strdup(outer);
6 int nbefore = found - outer;
7 int nremoved = strlen(inner);
8 char *result = malloc(strlen(outer) + 1 - nremoved);
9 strncpy(result, outer, nbefore);
10 strcpy(result + nbefore, found + nremoved);
11 return result;
12 }
Talk through the following questions with your partner to understand this code. Ask questions about anything that doesn't make sense to you!
- What is the purpose of the library function
strstr
? What does it return? - Line 5 handles the case where there is no occurrence of
inner
inouter
, i.e. nothing to subtract. Why does the function returnstrdup(outer)
rather than just returningouter
? - Line 6 subtracts two pointers. Woah, that's legal?! What is the result of subtracting two pointers? Here's a hint: if q = p + 3, then q - p => 3. How does the sizeof(pointee) affect the result of a subtraction?
- After Line 9 executes, is the result string null-terminated? Why or why not?
- Your colleague proposes changing Line 10 to instead:
strcat(result, found + nremoved)
claiming it will operate equivalently. You try it and it even seems to work. Is it equivalent? No, not quite... The code is just "getting lucky". In what way does it (wrongly) depend on the initial contents of newly allocated heap memory? How will the function break when that wrong assumption is not true?
The function join(arr, n)
returns a string which is the concatenation of all n
strings in arr
. The returned string is heap-allocated and becomes the client's responsibility to free when done.
char *join(char *arr[], int n)
{
char *result = strdup(arr[0]);
for (int i = 1; i < n; i++) {
result = realloc(result, strlen(result) + strlen(arr[i]) + 1);
strcat(result, arr[i]);
}
return result;
}
The point of this code is to introduce the realloc function.
- At the outset of the loop, the total size needed is not known, so it cannot do the allocation upfront. Instead it makes an initial allocation and enlarges for each additional string. This resize-as-you-go approach is an idiomatic use case for
realloc
. - Is the size argument to
realloc
the number of additional bytes to add to the memory block or the total number of bytes including any previously allocated part? - The variable
result
is initialized tostrdup(arr[0])
. Why does it make a copy instead of just assigningresult = arr[0]
? - The
realloc
man page says the return value fromrealloc(ptr, newsize)
may be the same asptr
, but also may be different. In the case where it is different,realloc
has copied the contents from the old location to the new. The old location is also freed. Change the code to callrealloc
to not use its return value (i.e. don't re-assignptr
, just drop the result). Re-compile and you should get a warning from the compiler. What error is that warning trying to caution you against? Arealloc
call that doesn't catch its return value is never the right call.
3) Code study: calloc (30 minutes)
Although we're not yet ready to dig into the internals of how a heap allocator operates, we can drop down to look into the implementation of a standard library function that builds on malloc
. Introduce yourself to malloc
's cousin calloc
by reading its man page (man calloc
) and then review the implementation for musl_calloc
shown below (taken from musl):
1 void *musl_calloc(size_t nelems, size_t elemsz)
2 {
3 if (nelems && elemsz > SIZE_MAX/nelems) {
4 return NULL;
5 }
6 size_t total = nelems * elemsz;
7 void *p = malloc(total);
8 if (p != NULL) {
9 long *wp;
10 size_t nw = (total + sizeof(*wp) - 1)/sizeof(*wp);
11 for (wp = p; nw != 0; nw--,wp++)
12 *wp = 0;
13 }
14 return p;
15 }
The calloc
interface is presented as allocating an array of nelems
elements each of size elemsz
bytes. It initializes the memory contents to zero, which is the feature that distinguishes calloc
from ordinary malloc
.
Here are some issues to go through with your partner to understand the operation of this function.
- The calloc interface as described in terms of arrays is somewhat misleading because there is nothing array-specific to its operation. It simply allocates a region of total size
nelems * elemsz
bytes. The callcalloc(10, 2)
has exactly the same effect ascalloc(4, 5)
orcalloc(1, 20)
and memory allocated by any of those calls could be used to store a length-20 string, an array of 5 integers, or a 20-byte struct. - What are Lines 3 and 4 of the function doing? What would be the effect on the function's operation if these were removed?
- A colleague objects to the use of division on Line 3 because division is more expensive than multiplication and it requires an extra check for a zero divisor. He proposes changing Line 3 to
if (nelems * elemsz > SIZE_MAX)
claiming it will operate equivalently and more efficiently. Explain to him why his plan won't work. - Line 7 declares
p
to be of typevoid*
, which matches the return type ofmalloc
. Avoid*
pointer is a generic pointer. It stores an address, but makes no claim about the type of the pointee. It is illegal to dereference or do pointer arithmetic on avoid*
, since both of these operations depend on the type of the pointer, which is not known in this case. - What is the purpose of Line 8? What would be the effect on the function's operation if that test were removed and it always proceeded into the loop?
- Line 10 contains the expression
sizeof(*wp)
. Given thatwp
was declared and not assigned, this looks to dereference an uninitialized pointer. Not to fear,sizeof
is evaluated at compile-time by simply working out the size for an expression from its declared type. Sincewp
was declared as along *
, the expression*wp
is of typelong
which on our system has size 8 bytes. The compiler will have transformed it into(total + 8 - 1)/8
. There will be no nasty runtime surprise. - Let's further dig into Line 10. It should remind you of code we've seen previously (roundup in lab1). What is the value of
nw
for the callcalloc(20, 4)
, forcalloc(2, 7)
, forcalloc(3, 1)
? Aha, this is a rounding operation, as suspected. This calculation is making an assumption about how many bytesmalloc
will actually reserve for a given requested size. What is that assumption? (This behavior is not a documented public feature and no client that was a true outsider should assume this, butcalloc
andmalloc
were authored by the same programmer, thuscalloc
is being written by someone with firsthand knowledge of themalloc
internals and acting as a privileged insider.) - The initialization of the
for
loop on Line 11 assignswp = p
, yetwp
andp
are not of the same declared type. Why is no typecast required here? Would the function behave any differently if you added the typecastwp = (long *)p
? - The increment of the
for
loop on Line 11 offers a rare glimpse of C's comma operator. You can read more about the comma operator in Wikipedia; the only use you are likely to see in practice is to pack multiple expressions into a loop init or increment. In this case, the increment will both advance the pointer and decrement the count of iterations remaining. - Imagine if Line 9 instead declared
char *wp
with no other changes to the code. How would this change the function's behavior? What if the declaration wasint *wp
? What if the declaration werevoid *wp
? For each, consider how how the pointee type affects the operationswp++
and*wp = 0
. What do you think is the motivation for the author's choice oflong *
?
4) Valgrind: heap errors (25 minutes)
Last week's lab introduced you to Valgrind and this tool will be increasingly essential as we advance to writing C code with heavy use of pointers and dynamic memory. The buggy.c
program has some planted errors that misuse heap memory to provide fodder for further practice with Valgrind.
- Review the program in
buggy.c
and consider error #1. When the program is invoked as./buggy 1 argument
, it will copyargument
into a heap-allocated space of 8 characters. If the argument is too long, this code will write past the end of the allocated space. This kind of error is called a buffer overflow (because the write overflows past the end of a buffer). What are the consequences of this error? Let's observe. - Run
./buggy 1 leland
to see that the program runs correctly when the name fits. Now try longer names./buggy 1 stanford
and./buggy 1 lelandstanford
. Surprisingly, these also seem to "work", apparently getting lucky when the overrun is still relatively small. Pushing further to./buggy 1 lelandstanfordjunioruniversity
, you'll eventually get the crash you deserve. - Many crashes have nothing more to say that just "segmentation fault", but this particular crash gives a dump of the program state. This error-reporting is coming from inside the
free
function of the standard library. The dump is not presented in a particularly user-friendly form, and rather than get mired in trying to decipher it, just take it as an indication of a memory problem, for which we will turn to Valgrind for further help. - Try
valgrind ./buggy 1 stanford
and review Valgrind's report to see what help it can offer. Whether the write goes too far by 1 byte or 1000 bytes, Valgrind will helpfully report the error at the moment of transgression. Hooray! - Errors 2, 3, and 4 explore the consequences of other misuses of heap memory. Consider these cases one at a time. Review the code to see what the error is. Run it at the shell (e.g.
./buggy 2
). Is there an observable result of the error? Now run under valgrind. Take note of the terminology that the Valgrind report uses and how it relates back to the root cause.
Be forewarned that memory errors can be very elusive. A program might crash immediately when the error is made, but the more insidious errors silently destroy something that only shows up much later making it hard to connect the observed problem with the original cause. The most frustrating errors are those that "get lucky" and cause no observable trouble at all, but lie in wait to surprise you at the most inopportune time. (e.g. when we are grading your work:-) Cultivate the habit of using Valgrind early and often in your development to detect and eradicate memory errors before they strike.
Check off with TA
At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission. Take stock of your understanding with our selfcheck.