Lab 3: Pointers, Arrays and the Heap

Lab sessions Tue Jul 13 to Thu Jul 15

Lab written by Julie Zelenski, with modifications by Nick Troccoli

Learning Goals

During this lab, you will:

investigate how arrays and pointers work in C
get further practice with gdb and valgrind
experiment with code that dynamically allocates memory on the heap

Group Work Tips

As a reminder, here are recommended tips for maximizing group work time:

sharing your video if you can
choosing a driver for each problem - the driver is the person sharing their screen and typing in commands, while others view their screen. Rotate for each problem to give everyone a turn.
choosing a driver order at the start of each lab; for example, one week you could decide it's in order of birthdays, and next week could be a different order.
reviewing key takeaways in the last few minutes of group time: before returning to the main room, brainstorm as a group about key takeaways and questions you have for the full discussion. It's ok if you don't have time to get through everything!

Get Started

Clone the lab starter code by using the command below. This command creates a lab3 directory containing the project files.

git clone /afs/ir/class/cs107/repos/lab3/shared lab3

Tip: the GDB and Valgrind guides are great debugging references. In particular, the GDB guide has a section on good debugging strategies, including ways to use GDB to fix any issues you encounter. We'd highly recommend reviewing them if you haven't already!

Open GDB Guide Open Valgrind Guide

Lab Exercises

1) GDB: Pointers and Arrays (10-15 minutes + 10 min all-lab discussion)

Learning Goal: get more practice using GDB to investigate program state, and learn more about pointers, pointer operators like & and *, and arrays. GDB is even more essential when working with pointers to print out program values, and pointers will play an important role on assignments going forward!
Starter code file: code.c

Tip: have the team member who is driving open 2 terminal windows and log into myth in each one - use one to run GDB, and another to open code.c to view it while debugging!

First, we'll play around with pointer and array syntax, and how the two are similar and different. To do this, we'll use the provided code.c program, which is a not-terribly-useful program that exhibits various behaviors of arrays and pointers.

1a) Arrays vs. Pointers (5 minutes)

Build the program, start it under gdb, and set a breakpoint at main. When you hit the breakpoint, use info locals to list the values of all local variables in the current function (which should currently be uninitialized). Step through the initialization statements and use info locals again. Remember that when gdb reports that execution is at line N, this is before line N has executed.
Q: For each expression below with local variable arr, first try to figure out what the result of the expression should be, and then evaluate it in gdb to confirm that your understanding is correct.

(gdb) p *arr
(gdb) p arr[1]
(gdb) p &arr[1]
(gdb) p *(arr + 2)
(gdb) p &arr[3] - &arr[1]

(gdb) p sizeof(arr)
(gdb) p arr = arr + 1

The main function initializes ptr to arr. The name of a stack array and a pointer to that array are almost interchangeable, but not entirely. Try re-evaluating the above expressions with ptr substituted for arr. The first five evaluate identically, but the last two produce different results for ptr than arr. Q: What is reported for the size of an array? the size of a pointer? The last one is the trickiest to understand. Why is it allowable to assign to ptr but not arr?
Execute p ptr = ptr - 1 to reset ptr to its original value.
&arr[0] stores the same address as &ptr[0], but &arr isn't the same address as &ptr. Q: Why not? NOTE: this discrepancy is key - it will almost certainly come up on future assignments!
Use the gdb step command to advance into the call binky(arr, ptr). step is like next, but instead of executing the entire line and moving to the next line, it steps into the execution of the line. Once inside binky, use info args to see the values of the two parameters. They are identical! Q: What happens in parameter passing to make this so?

1b) Pointers, double pointers, and `gdb` stack frames in `winky()` (5 minutes)

Set a breakpoint on change_char and continue until this breakpoint is hit by executing the GDB c command (for "continue"). When stopped there in gdb, use info args. The arguments shown are from the function call ("frame") for change_char. The default frame of reference is the currently executing function.
Use backtrace to show the sequence of function calls that led to where the code is currently executing. You can select a different frame of reference with the gdb frame command. Frames are numbered starting from 0 for the innermost frame, and the numbers are displayed in the output of the backtrace command. Try the command frame 1 to select the frame outside change_char and and then use info locals to see the state from winky. The gdb command up is shorthand for selecting the frame that is one higher than the current one, and down is shorthand for selecting the frame that is one lower than the current one. Type frame 0 or down to return to the line where the debugger is paused.
Step through change_char and examine the state before and after each line. Use info args to show the inner frame and up and info locals to show what's happening in the outer frame. Carefully observe the effect of each assignment statement. Q: Can you explain the behavior of each line?
Step through the call to change_ptr and make the same observations. Q: Which of the assignment statements had a persistent effect in winky, and which did not? Why?

If you don't understand or can't explain the results you observe, stop here and discuss them with your labmates and lab leader. Having a solid model of what is going on under the hood is an important step toward understanding the commonalities and subtle differences between arrays and pointers.

2) Valgrind: Heap Errors and Memory Leaks (15-20 minutes + 10min all-lab discussion)

Learning Goal: learn how to use Valgrind output to identify the code location and root cause of heap memory errors and memory leaks in programs. Valgrind is an essential tool on assignments going forward!
Starter code files: buggy.c, leaks.c

Last week's lab introduced you to Valgrind. This tool will be increasingly essential as we write more C code with heavy use of pointers and dynamic memory. In particular, Valgrind can help detect misuse of heap memory (e.g. writing past what you've allocated, freeing twice, etc.) and can detect when you forget to free heap memory that you've allocated. Let's take a look at how Valgrind detects each of these kinds of issues.

2a) Heap Errors (10 minutes)

First, Valgrind is great at detecting memory issues relating to dynamic memory, such as writing past what you've allocated, accessing freed memory, freeing memory twice, etc. Let's examine the buggy.c program, which has some planted errors that misuse heap memory, to see how Valgrind handles and reports these errors.

Error 1: Valgrind tutorial

Review the program in buggy.c and consider error #1. When the program is invoked as ./buggy 1 argument, it will copy argument into a heap-allocated space of 8 characters. If the argument is too long, this code will write past the end of the allocated space - a "buffer overflow" (because the write overflows past the end of a buffer). This is similar to errors we've seen in the past about writing past stack memory used to store a string, but now the overflow is happening on the heap. What are the consequences of this error? Let's observe.

Run ./buggy 1 leland to see that the program runs correctly when the name fits. Now try longer names ./buggy 1 stanford and ./buggy 1 lelandstanford. Surprisingly, these also seem to "work", apparently getting lucky when the overrun is still relatively small. Pushing further to ./buggy 1 lelandstanfordjunioruniversity, you'll eventually get the crash you expect.
Many crashes have nothing more to say that just "segmentation fault", but others may give a dump of the program state which is sometimes hard to decipher. Let's turn to Valgrind for further help.
Try valgrind ./buggy 1 stanford and review Valgrind's report to see what help it can offer. Whether the write goes too far by 1 byte or 1000 bytes, Valgrind will helpfully report the error at the moment of transgression. Hooray!

Error 2: Independent investigation

Now it's time for you to work on investigating error 2 as a group, answering the following questions:

Review the code to see what the error is. Q: What is the error?
Run it (e.g. ./buggy 2). Q: Is there an observable result of the error?
Run it under Valgrind. Q: What is the terminology that the Valgrind report uses and how does it relate back to the root cause?

Error 3 is included as an extra, for additional practice if you'd like.

Key Takeaways: Memory errors can be very elusive. A program might crash immediately when the error is made, but the more insidious errors silently destroy something that only shows up much later, making it hard to connect the observed problem with the original cause. The most frustrating errors are those that "get lucky" and cause no observable trouble at all, but lie in wait to surprise you at the most inopportune time. Make a habit of using Valgrind early and often in your development to detect and eradicate memory errors!

2b) Memory Leaks (10 minutes)

Second, Valgrind is great at detecting when you forget to free heap memory that you've allocated. These are called memory leaks (not errors), and rarely (if ever) cause crashes. For this reason, we recommend that you do not worry about freeing memory until your program is completely written. Then, you can go back and deallocate your memory as appropriate, ensuring correctness at each step. Memory leaks are still an overall issue, however, because your program should be responsible for cleaning up any memory it allocates but no longer needs. In particular, for larger programs, if you never free any memory and allocate an extremely large amount, you may run out of memory in the heap! Let's examine the leaks.c program, which has some planted memory leaks, to see how Valgrind handles and reports them.

Leak 1: Valgrind tutorial

Review the program in leaks.c and consider leak #1. When the program is invoked as ./leaks 1, it will allocate 8 bytes on the heap, but then immediately return, causing the program to lose the address of this heap memory. A memory leak! The program terminates fine, however. Let's see how Valgrind can help us detect it.

Try valgrind --leak-check=full --show-leak-kinds=all ./leaks 1 and review Valgrind's report to see what help it can offer. You'll notice that Valgrind reports a "LEAK SUMMARY" at the bottom, as well as a total heap usage summary, which shows what memory was still in use at exit (that should have been freed). You can also see the number of reported allocations and frees do not line up, indicating a leak. Finally, Valgrind even shows where the leaked memory was originally allocated. Helpful! (Note: make sure to put these flags before the command to run your actual program. Specifically, the following is not equivalent: valgrind ./leaks 1 --leak-check=full --show-leak-kinds=all, as this passes these flags to your program, instead of valgrind!)
Valgrind categorizes the leaks into different types. For instance, "definitely lost" means heap-allocated memory that was never freed to which the program no longer has a pointer. "indirectly lost" means heap-allocated memory that was never freed to which the only pointers to it also are lost. For example, if you orphan a linked list, the first node would be definitely lost, the subsequent nodes would be indirectly lost. (Read more in our Valgrind guide!)
The total heap usage includes more than just your explicit allocations. For example, other functions, like strdup, may allocate/free memory internally as well!

Leak 2: Independent investigation

Now it's time for you to work on investigating leak 2 as a group, answering the following questions:

Review the code to see what the leak is, and then run it (e.g. ./leaks 2). Notice how the program always runs fine. Q: What is the issue with the code?
Run it under Valgrind. Q: What is the terminology that the Valgrind report uses and how does it relate back to the root cause?

Leak 3 is included as an extra, for additional practice if you'd like.

Key Takeaways: A memory leak is when you allocate memory on the heap, but do not free it. We recommend not worrying about freeing memory until your program is functionally complete. Leaks could happen if you forget to free memory within a function, return allocated memory and the caller does not free it, etc. Note that leaks do not only have to come from memory you allocate using malloc. For example, functions like strdup allocate memory that it is the caller's responsibility to free!

As a final note, often the backtrace for a Valgrind-reported error or leak will show a trace to somewhere within a library function such as strcpy or strdup. It may be tempting to conclude that the problem is in the library, and is not ours to fix, but this reasoning is almost certainly not true - instead, it's likely an issue with how we are using that library function in our code (the parameters we pass in, not freeing values it allocated, etc.).

[Optional] Extra Problems

Finished with lab and itching to further exercise your pointer and heap skills? Check out our extra problems!

Recap

Nice work on the lab! It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to finish on your own whatever is need to solidify your knowledge. Also take a chance to reflect on what you got what from this lab and whether you feel ready for what comes next! The takeaways from lab3 should be continued development of your gdb and Valgrind skills, as well as your skills reading and writing code with heavy use of pointers. Arrays and pointers are ubiquitous in C and a good understanding of them is essential. Here are some questions to verify your understanding and get you thinking further about these concepts:

If ptr is declared as an char *, what is the effect of ptr++? What if ptr is declared as a int *?
Although & applied to the name of a stack-allocated array (e.g. &buffer) is a legal expression and has a defined meaning, it isn't really sensible. Explain why such use may indicate an error/misunderstanding on the part of the programmer.
The argument to malloc is size_t (unsigned). Consider the erroneous call malloc(-1), which seems to make no sense at all. The call compiles with nary a complaint -- why is it "ok"? What happens when it executes?
What is the purpose of the realloc function? What happens if you attempt to realloc a non-malloc-ed pointer, such as a string constant?
What is the difference between a memory error and a memory leak?
Your coworker suggests you use the function below as a "safer" free function that prevents accidentally using a freed pointer.

void free_and_null(void *ptr) {
    free(ptr);
    ptr = NULL;
}

Will this function correctly free the client's pointer? Will it correctly set it to NULL? Explain.

xkcd pointers comic