Lab 1: C programming under unix

Lab sessions Mon Apr 04 to Thu Apr 07

Lab written by Julie Zelenski

Learning goals

During this lab you will:

work through editing, compiling, testing, and debugging C programs using editor/make/gdb/Mercurial/Valgrind
write C code to use the standard library facilities for C-strings and file I/O

Find an open computer to share with a partner and introduce yourselves. Together you are to tackle the exercises outlined below. Along the way, you are encouraged to discuss issues and share useful tips with your labmates. The TA will circulate to offer advice and answers and keep everyone progressing smoothly.

Lab is a time to experiment and explore. The guided exercises revisit topics from recent lectures/readings, provide hands-on practice in a supported environment, and prepare you to succeed at the upcoming assignment. To record lab participation, we use an online checkoff form that you fill out as you work. Lab is not a race to answer exactly and only the checkoff sheet-- these questions are deliberately trivial and we use them merely to record attendance and get a read on how far you got. To get the most out of lab, be prepared to dive into the nooks and crannies. The combination of hands-on experimentation, give and take with your peers, and the guidance of the TA makes lab time truly special. Your sincere participation can really accelerate your learning!

Lab exercises

Clone the starter project. We distribute all labs and assignments as Mercurial repositories. If you haven't yet, read the CS107 mercurial guide and be sure both partners have set up their Mercurial environment. Clone the lab repo by using the command
```
hg clone /afs/ir/class/cs107/repos/lab1/shared lab1
```
This command creates a lab1 directory containing the source files and Makefile.

Pull up the online lab checkoff right now and have it open in a browser so you can jot things down as you go. Only one checkoff needs to submitted for both you and your partner.
Basics of make. All CS107 programs are built via a makefile. Although you can compile a single file C program with a direct call to gcc file.c, if your project includes multiple files or needs to configure compiler flags and other settings, a gcc command becomes unwieldy. Creating a makefile with the necessary configuration allows one use of make to control the build. Here are the key things to know about using make:
- make is how you initiate a build. Try make in your lab1 directory and read the output to see the steps in the build process. Use make again and observe that it is clever enough to realize nothing needs to be rebuilt, as no source files have changed. If the build fails due to error, make exits without updating the program. Any version left behind after a failed build will be stale.
- make clean will remove any previous build products. This gives a fresh start. Any subsequent make will rebuild from scratch.
- make narrates the build steps and its output is fairly verbose. Any build warnings will be intermingled in the output. We strongly recommend that you adopt the habit of maintaining a clean build (i.e. resolve warnings as soon as they appear). Less clutter means less chance that you'll overlook something important. When grading, we will expect your submissions to build without warnings.
You will not be asked to construct your own makefiles from scratch and only occasionally will edit ours. If you're curious to know more, you can open any of our makefiles and read along with the comments to get a rough sense of how they are structured or check out the CS107 guide to make.
Mercurial, gdb, and valgrind. The count program searches a file for all words containing a target string and reports the matches. A colleague started the project but handed it over to you unfinished. This exercise will guide you to test and debug this program into a working state.

You can use Mercurial to review the history and see how the code evolved. Each commit is referred to a "changeset" or a "revision". Changesets are numbered sequentially starting from 0 up to the most recent revision. The command hg diff -r<N> -r<M> shows what changed between revisions numbered <N>and <M>. The command hg update -c -r<N> restores the files to match the revision numbered <N>.

Let's walk the history step by step. Start with hg log to get a list of all commits.
- Changeset 0: Restore to changeset 0 using the command hg update -c -r0. Review the code in count.c and use make to build it. Test it on few cases such as count states q and count states as. It does seem to work correctly (as long as you don't look too hard for problems). Ship it?!
- Changeset 1: A change in spec now requires the program to count prefix matches instead of containing. Use hg diff -r0 -r1 to review what changed between changesets 0 and 1. Take note of the string library functions used for containing match and prefix match. (handy knowledge for assign1!) Update to changeset 1, re-build, and re-test. The program now prints garbage for the number of matches. Repeating the same test a few times doesn't even get consistent results from run to run. Let's use the debugger to better understand what's going on. Use gdb count to enter the debugger and use gdb's break command to set a breakpoint on the line number that increments the count of matches. From within gdb, run the program with arguments like this run states mi. When the breakpoint is hit, use gdb command info locals to show the function's local variables and you'll see the counter is garbage before it's even used -- d'oh, it was never initialized! (This same bug was present in previous changeset yet asymptomatic there; the observed "correctness" was just dumb luck in getting a leftover zero.) Diff changeset 1 against 2 to see the key initialization.
- Changeset 2: Update to changeset 2 and rebuild. This version should now correctly count the prefix matches. Hooray! Let's push a bit harder on testing. Try count states without the target argument. Yikes! This incorrect usage is rewarded with a "segmentation fault", our system's cryptic response to a bad memory access. To diagnose a seg fault, your go-to tool is gdb. Start the debugger on the program and run that incorrect usage. When it crashes, control will return to gdb. Use the gdb command backtrace to print the active sequence of function calls. Knowing where in the program and what was being attempted when it crashed is very helpful. At the time of this particular crash, the innermost call is within a library function. There are two important things about this we want to point out: (1) gdb's complaint about "No such file or directory" looks noteworthy but is actually harmless and expected (read faq of our gdb guide for a longer explanation) and (2) when a bad memory access happens within a library call, don't mistakenly conclude the library code is at fault. It is the bad argument being passed to strlen that is the real culprit. Why is the program calling strlen on a NULL pointer? What should it be doing instead when invoked incorrectly? Diff changeset 2 with 3 to review the fix that was made.
- Changesets 3 & 4: Diff changesets 3 and 4 to see what changed. What concern was this fix intended to address? Create your own test case that will demonstrate that the applied fix is effective. Update to changeset 3, rebuild, and try your test case to see how the failure is reported. Then update to changeset 4 and rebuild/test to confirm the handling has changed as intended.
Now it's your turn to pick up where your colleague left over. Update to the most recent revision and take this program over the finish line by completing these two tasks:
- Another spec change: The requirement is to now count suffix matches instead of prefix. Change the code to compare the suffix of each word to the target. Use pointer arithmetic to access the tail of the word and use that as an argument to strcmp. Be careful to take into account the case when a given word is shorter than the target. Edit the code, re-test to your satisfaction, and commit.
- Clean valgrind run: Running the program under valgrind should report no memory errors. (If you aren't sure of difference between memory leaks and memory errors, review valgrind guide now) However, a minor memory leak remains. The leak we're expecting will be reported as "still reachable" and the valgrind report will suggest re-running with additional flags to provide more information. Re-run valgrind, adding its suggested flags. The additional information should help you identify where the leak is coming from and what fix is needed. Add the necessary code, test to verify it resolves the issue, and make a final commit!
igpay atinlay. The pig program takes a sequence of words as command-line arguments and translates them into Pig Latin. You convert a word to Pig Latin by removing any sequence of contiguous consonants from the beginning of the word and attaching them to the end followed by ay, e.g. trip -> iptray. If the word starts with a vowel, you just add way to the end, e.g. item -> itemway.

Compile and run the program to see what you've got so far. The starter program merely echoes its arguments, as the translate function is unattempted. Let's get to work on that. A good first task would be to write a helper function to find and report the position of the first vowel. Add and implement this helper. Once this piece is tested and working, make a commit to record your progress.

The next task is to dissect the word around that vowel and rearrange the subparts. A C++ or Java programmer would create substrings for the head and tail and then concatenate them. Instead, in C, you coy characters from the head and tail sections of the original string directly to the appropriate place in the output string (without making temporary strings along the way). The C library functions strcpy/strncpy and strcat/strncat can be used to copy strings, whole or in part, and are just what you need! To learn more about these functions, stop here and check K&R or your C reference, pull up man pages for functions by name, e.g. man strlen, and/or skim our CS107 guide to C library functions.

Put your newfound understanding of C-strings and string library functions to use. Add the code to copy characters from the inbuffer to their proper translated place in the out buffer. Compile and test your work. Use commit to take a final snapshot when you are satisfied. This exercise in string manipulation will be great preparation for the first assignment which features a lot of string munging!

Optional: If you have additional time, convert the pig program to use dynamically-allocated memory (rather than stack-allocated) to store the translated results. Create a right-sized array of char* elements (based on the number of arguments given to the program), loop over the arguments to translate each and store into the array, print out the array of results, and then free all memory before exiting. This is good practice in use of malloc/free, another key part of the first assignment.

Check off with TA

At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission so you are properly credited for your work. It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to work through any unattempted parts or unresolved issues to solidify your knowledge of this material before moving on! Try our self-check to reflect on what you got what from this lab and whether you're feeling ready for what comes next!