CS107 Lab 1: C programming under Unix

Lab sessions Mon Jan 16 to Thu Jan 19

Lab written by Julie Zelenski

During lab, you will experiment and explore, ask and answer questions, and get hands-on practice in a supported environment. The lab exercises revisit topics from recent lectures/readings and prepare you to succeed at the upcoming assignment. To record lab participation, we use an online checkoff form that you fill out as you work. Lab is not a race to answer exactly and only the checkoff sheet-- these questions are deliberately trivial and we use them merely to record attendance and get a read on how far you got. Instead of using the checkoff questions to dictate your activity, let satisfying your curiosity and achieving a sense of mastery be your guide. The combination of active exploration, give and take with your peers, and the guidance of the TA makes lab time truly special. Your sincere participation can really accelerate your learning!

Learning goals

During this lab you will:

get further practice with common unix utilities
work through editing, compiling, testing, and debugging C programs in the unix environment

Find an open computer to share with a partner and introduce yourselves. Together the two of you will tackle the exercises below. Everyone is encouraged to discuss issues and share insights with all your labmates. The TA will circulate to offer advice and answers and keep everyone progressing smoothly.

Get started

We distribute all labs and assignments as Mercurial repositories. If you haven't yet, read the CS107 mercurial guide and be sure both partners have set up their Mercurial environment. Clone the lab repo by using the command below. This command creates a lab1 directory containing the source files and Makefile.

hg clone /afs/ir/class/cs107/repos/lab1/shared lab1

Pull up the online lab checkoff right now and have it open in a browser so you can jot things down as you go. Only one checkoff needs to submitted for both you and your partner.

Lab exercises

Part 1: Unix practice

This warmup exercise is to try out the unix commands suggested below while simultaneously chatting up your labmates about your assign0 experiences. How is everyone doing so far on getting comfortable in the unix environment? Do you have open questions or an issue you'd like help with? Did you learn a nifty trick or two that you'd like share? Let's hear it!

man. The online manual is available via the man command. There are man pages for unix utility programs (try man ls, man rm, ...) as well as C library functions (man fopen,man strncmp, ...). Man is your go-to to learn the usage for a command, review option flags, or find a function prototype. If you aren't sure of the exact command you need, try man -k <term> to get a list of man pages related to a search term (man -k calculator, man -k quota).
printenv and env. Read their man pages for these two commands to see the basic usage. Here is some suggested usage to try out and see how these commands operate.
- Run printenv with no arguments and skim its output. What is the response if you invoke with an argument such as printenv USER? What if you make a bad request printenv BOGUS?
- Run printenv, then run env BINKY=1 WINKY=2 printenv. What changes between the two? Run printenv again to determine whether those changes were transient or permanent.
- Run date then env PATH=/tmp date. Can you explain why the date command failed to execute in the second case?
grep. When searching files for lines that match a given pattern, grep is just the tool for the job. In its simplest usage, grep finds occurrences of a fixed string (no metacharacters), but where grep especially shines is in matching complex patterns expressed as regular expressions (commonly shortened to regex). Standard grep is quite feature-rich, so we highlight a few key features to get you started rather than ask you to wade through its (overwhelming) man page.
- Adding the color flag (grep --color) will highlight the matched portion of the line in red. The red highlight makes it possible to see exactly which chars matched. I find this so helpful that I defined a shell alias to make grep expand into grep --color so I never have to be without it, you may want to do the same. (Note: if grep --color doesn't display in red in your terminal, make a shout out on Piazza asking for help getting color configured for your specific terminal program)
- Here are the four core metacharacters that you will implement in assignment 1:
```
.     matches any character
^     matches beginning of line
$     matches end of line
*     matches zero or more repeats of char to left of *
```
  Practice forming regular expressions using these metacharacters and testing them using grep. The slink/dictionary file is a list of English words. Grep the dictionary file for practice, e.g. grep joy slink/dictionary should report all words that contain joy. Here are some suggested exercises:
  - match all words that end with zy
  - match all words that start with k and end with k
  - match all words that are exactly 7 letters long
- Certain punctuation characters have special meaning to the shell. For example, * and $ in a command-line argument are expanded by the shell before passing them to the program. Enclose a pattern in single-quotes to suppress these special treatment. Run the command grep my* slink/dictionary and compare to grep 'my*' slink/dictionary to see the difference. The shell expands the unquoted my* to all filenames in the current directory that start with my. Run the command echo my* to see the arguments after the shell has expanded the wildcard. Where should you add the quotes to the command grep my* my* to search for the pattern my* in all files starting with the letters my?
(Aside: Do you dream of having regex superpowers? I highly recommend visiting regexone.com for its great interactive tutorial and lots of regex practice problems -- check it out!)
- make. All CS107 programs are built via a makefile. Although you can compile a single file C program with a direct call to gcc file.c, if your project is larger or requires more complicated settings, a gcc command becomes unwieldy. Creating a makefile automates the build commands into a single request to make. By tracking file timestamps, make can be smart about rebuilding only those components that are out-of-date.
  
  Run make and observe it building the programs, narrating each step in the build process. Run make again and observe that it is clever enough to realize nothing needs to be rebuilt, as no source files have changed. Make a trivial edit to one of the source files and then run make again to see that it will rebuild any component that depends on the changed file. Run make clean to remove all previous build products. This gives a fresh start. Any subsequent make will rebuild everything from scratch.
  
  (Aside: you will not be asked to construct your own makefiles and only occasionally will edit ours, but if you're curious to know more, open our makefile and read along with the comments to get a rough sense of how they are structured or check out the CS107 guide to makefiles.)

Part 2: Implement and test myprintenv

Review, edit, and build the code

The myprintenv.c file is a partial implementation of a printenv-like program. The given code correctly handles printing the list of all environment variables, but is missing the handling for when invoked with an argument. Pull up the printenv man page to review how it supposed to handle it and then edit myprintenv.c to implement that behavior. Use make to build the program.

Test manually

Once your code successfully builds, it's time to test! One simple means to verify correctness is by comparing your results to a known-correct solution. For example, runprintenv PATH then run myprintenv PATH and manually eyeball the outputs to confirm they match. Even better would be to capture those outputs and feed them to diff so the tools can do the work. To make testing as painless as possible for you, we've automated simple output-based comparison into a CS107 tool called sanitycheck. You'll use sanitycheck throughout the quarter, so let's get some practice using it right away!

Test with sanitycheck

First read our sanitycheck instructions, then try running sanitycheck in your lab1 directory. Follow along by reading the report sanitycheck produces while running. A test case will run a myprintenv command, compare its output to the solution, and report if mismatched. The default sanitycheck for lab1 runs four test cases on myprintenv. How did your implementation fare on them? If you passed all four, great for you! It not, review the sanitycheck report to learn which one(s) you didn't pass. Follow up with some manual testing until you fully understand the problem, edit your code to resolve it, and build and test again. Repeat until your version passes all the sanity tests for myprintenv.

Part 3: Test and debug mywc

The mywc.c file implements a wc-like program (man wc) that is intended to count the lines, words, and characters from a file and report the longest line. The code was written by your colleague who claimed it is "complete", but on his way out the door he mutters something unintelligible about possible unfixed bugs. Uh oh... Your task is to test and debug the program into a fully functional state using CS107 sanitycheck and the gdb debugger. Take a moment to skim the CS107 guide to gdb before going further.

As always, your first task when given a piece of code is to carefully read it over. Do you understand its intended purpose and how it operates? Does the code use any techniques or library functions that are unfamiliar to you? Here are a few issues to discuss with your partner:
- How does the program handle reading either stdin or a named file?
- How does the code strip the trailing newline from the fgets result? When might the fgets result not end in a newline? What does the code do in that case?
- How can you find out whether isspace is merely a test for ch == ' ' or does fancier?
- What is the purpose of the %6d in the printf format?
Run mywc on a few different files and/or inputs typed to stdin. It seems to be printing fairly reasonable numbers for the count of characters and lines, but that word count is clearly bogus. Not only is it hopelessly wrong, it seems to vary from run to run on the same exact file. That's definitely no good! Let's use the debugger to better understand what's going on. Run gdb mywc to start the debugger and set a breakpoint on the line number within count_words where count is incremented. Run the program under gdb and when stopped at the breakpoint, print count before the increment. This should quickly draw your attention to the fact that count is garbage from the get-go-- d'oh, the variable was not initialized! In a safety-conscious language such as Java, code that uses a variable uninitialized is rejected during compilation. Do a make clean and make to review the build warnings and you'll see nary a peep from the C compiler about it. Up your vigilance, now knowing the C compiler can be pretty lax!
Initializing count is an easy fix. Edit, build, and re-run. You should observe that word count now has a reasonable value and stays consistent from run to run. You could try to hand-verify if word count is truly correct, but an easier approach is to run sanitycheck now. Uh oh -- it appears there is still work to be done. Get the program back into gdb, run on the input file for the first test case and step through the execution. Once you understand the miscounting issue, fix, build and re-run.
Your program should now pass the first and second mywc sanity test cases. Hitting a milestone is a good to make a commit, so do that now.
Next up is figuring out what's wrong with the third sanity test case. It matches on character/word/line counts, but reports the wrong longest line. But wasn't longest line already working correctly on the first two cases? If you do some further testing and make careful observations, you'll figure out that the true behavior of the program is that it always prints the last line. It's up to you to find and fix the underlying root cause. Consider: Is it failing to properly determine which word is longest, not properly storing the longest so far, printing the wrong thing at the end, or something else entirely? Get back into gdb, edit to fix and test.
The program should now pass all three default sanity cases -- way to go (and time to make another commit)! Does achieving sanity success mean your work is done? These three test cases are a good start on a testing plan, but aren't fully comprehensive coverage by themselves. Are there other test cases that could shake out problems as yet undiscovered? Quite probably. In fact, depending on the fixes you've made thus far, I strongly suspect you still have a bug remaining in count_word. None of the default sanity test cases expose this particular bug, so you'll need to work up a new test case to flush it out. Creating your own custom tests for sanity is explained in the sanitycheck instructions. The custom feature allows you to broaden your testing coverage and thereby increase your confidence that your program can handle anything we might throw at it in grading. Create a custom sanity test that runs mywc Makefile and you'll likely be rewarded with evidence of a heretofore unknown bug. Exposing bugs is the name of the game, as you can't fix what you don't know about. Long live sanitycheck!

Part 4: Give valgrind a whirl

Although your programs aren't yet making heavy use of memory/pointers, they will soon and a skill you want to add to your arsenal is using Valgrind for its help detecting memory errors. If you haven't already, review our guide to valgrind and let's try out Valgrind now.

We will need a buggy program for this so temporarily edit mywc.c to remove the earlier fix you made to initialize count. Build the program and test to see you have the previous bad behavior with the wacky word count.
Now run that buggy program under valgrind. How does valgrind report the problem? At the end of the valgrind report will be a recommendation to re-run with additional flags to provide more information. Re-run valgrind, adding its suggested flags. Does this information help you connect the errors reported to the root cause? Sweet!

Becoming a skilled user of Valgrind is invaluable to a programmer. We recommend that you run Valgrind early and often during your development cycle. It's best to focus on one memory problem at a time. Your strategy should go something like this:

run all newly-introduced code under Valgrind
stop at the first error reported
study the Valgrind report and follow the details to suspicious part of the code
ferret out root cause
resolve the problem
build and re-test to see that this error has gone away

Repeat for any remaining errors. Don't move on until you get a clean report from Valgrind. Note that memory leaks don't demand the immediate attention that errors do. Leaks can (and should) be safely ignored until the final phase of polishing a working program.

Check off with TA

At the end of the lab period, submit the checkoff form and ask your lab TA to approve your submission so you are properly credited for your work. It's okay if you don't completely finish all of the exercises during lab; your sincere participation for the full lab period is sufficient for credit. However, we highly encourage you to work through any unattempted parts or unresolved issues to solidify your knowledge of this material before moving on! Try our self-check to reflect on what you got what from this lab and whether you're feeling ready for what comes next!

Lab 1: C programming under Unix