Assignment 3: A Heap of Fun

Due: Wed Oct 20 11:59 pm
Late submissions accepted until Fri Oct 22 11:59 pm

Assignment by Julie Zelenski and Michael Chang, with modifications by Nick Troccoli

Learning Goals

This assignment delves into pointers, arrays, and dynamic memory. You will be building your skills with:

  • manipulating pointers & arrays
  • managing memory allocation on the stack and heap
  • using the C library I/O functions
  • using tools such as Valgrind to help track down memory errors and leaks

Overview

In this assignment, you will implement simplified versions of the unix commands uniq and tail. Your uniq command will pull out unique lines from the provided input and print out their frequencies. Your tail command will print out the last n lines of the provided input.

To get started on the assignment, clone the starter project using the command

git clone /afs/ir/class/cs107/repos/assign3/$USER assign3

The starter project contains the following:

  • util.c mytail.c, myuniq.c and Makefile: three files that you modify, and their Makefile for compiling
  • custom_tests: the file where you should add custom tests for your programs
  • mycat.c: a program that calls your read_line function in util.c for testing purposes.
  • samples: a symbolic link to the shared directory for this assignment. It contains:
    • SANITY.ini, prototypes.h and sanity.py: files to configure and run Sanity Check. You can ignore these.
    • myuniq_soln, mytail_soln and mycat_soln: executable solutions for the programs you will write.
    • colors, colors_no_end_newline, dictionary and names: sample text files for testing your unix filters
  • tools: contains symbolic links to the sanitycheck and submit programs for testing and submitting your work. It also contains the codecheck tool for checking for style and other common code issues.

Codecheck: Part of your style / code review score will be dependent on having no code issues when run through the codecheck tool. Make sure to run it to ensure your code adheres to all necessary guidelines!

[NEW FOR ASSIGN3] Assignment Support: Starting with assign3, you are required to fill out the QueueStatus questions for any debugging questions in helper hours. Our focus is on supporting you so that you can track down your own bugs, and our goal is to help you to take ownership of your own code and drive your own debugging. Please ask us how to best use tools, what strategies to consider, and how to improve your debugging process!
  • We are looking for information like - what have you observed from running GDB? Have you found a simple, reproducible test case? What part of your code do you believe is causing the issue? Etc. Simply stating what tools you have used is not enough - we are looking for how you have used them.
  • If you are eager to sign up for Helper Hours as soon as they open, we recommend preparing responses ahead of time to the questions: "What question do you have? Please provide as much information as possible." and "What steps have you already taken to try and answer your question? (e.g. I have found X test case that replicates the issue, I have narrowed it down to X lines of code, etc.). This information is required for help with debugging questions."
  • If you don't provide enough information on QueueStatus, we will ask you to please re-sign up in the queue once you can provide more information so that we can better help you.
  • You do not need to fill in the debugging questions for conceptual questions.
  • Style Questions: the CS107 Style Guide is the best resource for code quality and style clarifications. In helper hours, while we are able to answer general questions about this guide, we are not able to evaluate whether specific passages of code have good or bad style.

Check out our debugging guide - it outlines our recommended debugging steps, and how to approach common errors. This is the debugging framework we will be using in helper hours!

View Debugging Guide

Background: Unix Filter Commands

The uniq and tail unix commands are part of a special category of commands called "filters". A filter is a program that reads input (typically line by line) from the user or from a file, transforms the input in some way, and produces some output.

cat

cat is the simplest of all filters; the output just repeats the input. Invoking cat filename will print the contents of the named file. The man page for cat indicates that if no filename argument is given, the command will read from standard input. This means the program will read lines typed on the terminal by the user. Almost every unix filter (e.g. grep wc less head sort and more) conforms to this same interface, either reading from a named file or standard input. Reading from standard input can be handy for running a quick test without having to first create a file.

Try this out for yourself:

  • Invoke cat Makefile to view the contents of the file.
  • Invoke with the -b flag to number the non-blank lines: cat -b Makefile.
  • Now invoke cat -b without a filename argument. The program will wait for you to type input.
  • Type a line of text and hit enter. It echoes the line with its line number. Type a few more lines and see they are also echoed back and numbered.
  • When you have finished typing your input, press control-d on a line by itself. This signals the end of input (EOF for "end of file"). The filter program exits.

Other common filters

Below are a few other Unix filters to explore. For each, skim the manual page to learn about its options. Run the filter on an existing file and then again without a filename to operate on standard input. Remember that control-d is used to signal EOF.

  • head outputs the first lines of the input
  • tail outputs the last lines of the input
  • wc outputs the count of lines, words, and characters in the input
  • grep outputs lines from the input that match the search pattern
  • uniq filters out adjacent duplicate lines from the input
  • sort outputs the input rearranged into sorted order

When running on standard input, you will notice that some filters immediately respond to each line you enter, e.g grep echoes the line if it matches the pattern. Others (e.g. sort) don't output anything until you signal EOF. The depends on whether the filter operation needs to see all of the input before it can determine what to output.

Pipelines

The true power of filters is that they can be combined into "pipelines". A pipeline is a sequence of commands where the output of one command is passed to (or "piped into") the input of the next command. The pipe character | feeds the output of one command as input to the next. Filters are especially useful in pipelines because they can be used at the beginning, in the middle, or at the end. You can read more about piping input in our Unix guide.

Pipelines allow you to build fancier operations out of the basic filters. Some examples:

  • grep include util.c | wc -l finds lines in util.c containing "include", and feeds them to wc, which outputs the number of lines in its received input. This pipeline thus reports the count of matches.
  • cat -b util.c | tail outputs the lines of util.c labeled with line numbers, and feeds them to tail, which outputs only the last 10 lines and numbers.
  • The standard uniq command only filters out duplicate lines if they are adjacent in the input, while your version will filter out duplicate lines appearing anywhere in the input. For this reason, with the standard uniq command, if instead your data has duplicate lines scattered throughout, you need to first rearrange them so duplicates are adjacent for uniq to detect them. We can use a three-stage pipeline to output the count of distinct lines in a file: sort samples/names | uniq | wc -l. First, sort outputs the sorted lines in samples/names and feeds them to uniq, which outputs out all the lines, omitting adjacent duplicates, to wc, which outputs the number of lines in its received input.

Now it's your turn! How could we write a pipeline that prints out the three most common names in samples/names? (Hint: review the manpage for uniq to see what flags it has, and keep in mind that the same command can appear multiple times in a pipeline.)

The Unix philosophy is to move away from the construction of monolithic do-everything programs and instead encourage a proliferation of smaller commands, each focused and streamlined to perform one task well. Those small, discrete tools are combined to accomplish larger tasks. You'll be writing two of these tools in this assignment: uniq and tail.

Review and Comment Starter Code

The mycat, mytail and myuniq programs rely on reading files. File reading works similarly, but has different syntax, across different languages. Your first task is to study the provided program code, along with our guide to the C Standard Library (or your favorite C reference) to get an overview of how file-reading works in C. The test program mycat.c is given to you fully implemented. The starter files for myuniq.c and mytail.c contain code to handle the command-line arguments and file setup, and you will implement the rest. You should read and understand the given code, work out how to change/extend it to suit your needs, and finally add comments to document your strategy.

Some thoughts to consider when reviewing the given code:

  • What are the possible options for the mode argument to fopen? What happens if you attempt to open a file that doesn't exist or you don't have permissions for?
  • What is the consequence of failing to fclose a file?
  • How does a program switch to reading standard input when there is no filename argument?
  • Are the operations that read from a file also used to read standard input or is different code required for each?
  • The default behavior of the mycat program corresponds to invoking which command-line flag on standard cat?
  • mytail supports a command-line argument of -number to control the number of lines printed. How does the given code process that optional argument?
  • What range of values is accepted as an argument for mytail -number?
  • Do you see anything unexpected or erroneous? We intend for our code to be bug-free; if you find otherwise, please let us know!

As per usual, the code we provide has been stripped of its comments and it will be your job to provide the missing documentation in the mytail.c, myuniq.c and util.c files. You are not required to comment the mycat.c file.

1. Write read_line

The standard library function fgets is used to read text from a file.

char *fgets(char *buf, int bufsize, FILE *stream);

A FILE * is a special type that represents a file - you can't dereference it like a normal pointer, it is just used to store information about the file and how far into the file you have read. When you call fgets, it automatically updates the pointer value to reflect that you have read further into the file, so that next time you call fgets it will pick up where you left off just by passing in the same FILE *. Like most library functions, fgets does not do any allocation; instead it depends on the client to provide the buffer to write into. The client must pre-allocate memory based on an expectation of what size will be "large enough" to store the line. If the client's choice of size is too small, a line-reading function might overflow the buffer (if you usegets) or truncate the read operation (if you use fgets).

A more client-friendly design would be for the read function to handle allocation internally and make a line of the appropriate size. You are going to implement a read_line function that operates in this manner:

char *read_line(FILE *file_pointer);

The read_line function reads the next line from a file. The return value is a dynamically-allocated and null-terminated string (or NULL if at EOF, "end of file"). This function performs like a snazzier version of fgets (in fact, internally you should be implementing it using fgets!) in that it reads the next line but this version also allocates the storage, making life easy for the client.

Here are some specific details of the function's operation:

  • The function should read the next line from the file. A line is considered to end after the first newline character or EOF, whichever comes first.
  • The returned string should include the characters read up to the newline character, but should not include the newline character itself. If the next line consists of just a newline character, return an empty string.
  • If, read_line immediately hits EOF, meaning there are no characters to be read at all, the function should return NULL. Note that the EOF condition cannot be checked until after you have called fgets, because it is the process of trying to read past the end that triggers the EOF condition. This case should clean up any initial allocation that turned out to not be needed after all and return NULL.
  • If an error occurs in fgets, the function should return whatever has been read up to that point, or NULL if it has not read anything (hint: your function may end up doing this without you explicitly considering this case). You may assume in this case that if an error occurs during a call to fgets, no characters were read that time.
  • To allocate memory for the returned string, the function should make an initial allocation and successively enlarge it if needed. To be more specific, it should first malloc a buffer of minimum size (32) and read the first part of the line into the buffer, using a single call to fgets. If there is more to read, it should realloc the buffer to double its current size (64), and make another call to fgets to read the rest of the line. It should continue like this, doubling the allocation and reading more, until finally reaching the newline or EOF that signals the end of this line.
  • If it is unable to allocate sufficient memory, read_line should raise a fatal assert. As a habit, you should assert the result from every allocation request. Allocation failures are rare but deadly.
  • read_line should return the address of a dynamically-allocated piece of memory storing the next line. The client is responsible for deallocating this memory with free when no longer needed.
  • You should need to use only the fgets file I/O function, and not any other file I/O functions such as feof, fgetc, etc.

Write your implementation of read_line in the util.c file. You can test it using our provided mycat.c program; you will later use the read_line function to write the mytail and myuniq programs.

Tips/Testing

Our provided mycat.c program is fully implemented for you and will call your read_line implementation.

  • mycat works similar in operation to the standard cat; however, mycat supports at most one file: either the named file (if specified) or standard input (if not). Read Background on supplying input to mycat, as well as how mycat works with zero arguments.
  • For each valid string returned from a call to read_line, mycat explicitly prints out the string (with a line number), prints the newline character, and then frees the string. If read_line returns NULL, mycat will clean up and exit.
  • The mycat program is integrated with sanitycheck.

Note: you can assume that the file being read in is not a binary/executable file (an executable file may have "null bytes", or bytes that are all zeros, which may appear like null-terminators when working with strings and thus make implementations trickier. You do not have to worry about this).

Sanity check is reporting a mismatch about my program outputting an extra blank line at the end of the mycat Makefile test. None of the other tests complain about this. Any ideas what the issue might be?

We configured this specific test to report discrepancies in how your read_line handles empty lines and/or EOF. If read_line is called when the next line consists of just a newline, it should return an empty string. If called where there are no more characters to read, it should return NULL. Returning empty string instead of NULL or vice versa will produce extra or missing blank lines in the output. The usual behavior of sanitycheck is to ignore whitespace when comparing the output. This specific test uses a strict comparison that only accepts an exact match so as to alert you that there is a problem with your read_line. (Hint: make sure you're properly detecting and handling the EOF condition).

2. Implement mytail

A Note For Parts 2 + 3: How do I use gdb/valgrind on my program running in a pipeline?
gdb and Valgrind don't work well with pipelines. A simple alternative approach is to instead write a file with the input you want to test and then gdb/valgrind your program running on that file. To prepare the input file, you can use redirection > to capture the output of the command. Check out our Unix guide for more information.

What does the tail command do?

tail is a unix filter that prints the final N lines of the input in order. The value of N defaults to 10, but can also be set as a command-line flag, e.g. tail -4. Consider the following sample use of tail:

myth$ cat samples/colors
red
green
green
yellow
blue
blue
blue
blue
myth$ tail -3 samples/colors
blue
blue
blue

If the input contains fewer than N total lines, all lines from the input are printed.

How does mytail operate?

The mytail program you are to write is similar in operation to the standard tail with some differences. Specifically:

  • mytail supports only one command-line flag, -N, where N is the number of lines to output (standard tail has a number of other flags)
  • mytail only reads one file: either the named file (if specified) or standard input (if not) (standard tail can read from any number of file arguments)

While mytail is processing its input, it will need to keep track of N lines. For this you must use an array of N entries. At any given point, this array holds the most recent N lines read. The array is a "sliding window" of N lines that is moving its way through the input. When mytail reaches the end of the input, the window will contain the final N lines.

When mytail starts processing the input, it will read the first N lines into the array using your read_line function. Upon reading the N+1st line, it should not enlarge the array, but instead discard the 1st line (which is the oldest) of the input and use that array slot for the N+1st. The N+2nd overwrites the second line and so on. In this way, the array is used as a sort of circular queue which at any time stores at most the N most recent lines read. For example, let's say N = 3, and we have a file as follows:

line1
line2
line3
line4
line5

After we read in 3 lines, our array should look like [line1, line2, line3]. When we read in the 4th line, it would then look like [line4, line2, line3]. And when we read in the 5th line, we would then have [line4, line5, line3]. There is no explicit sliding down of elements in this model - instead, this window "slides" in that it continually stores the N most recent lines. From there, you will need to print out the lines in the correct order as they appeared in the file. Make sure you understand the sliding window approach and implement your mytail program as such! This approach is required for this program.

Since the array size is known in advance and doesn't change over its lifetime, this array is good opportunity to use stack allocation. The challenges in implementing this program are the logic for the circular queue, as well as being careful with the strings being allocated and deallocated.

3. Implement myuniq

What does the uniq command do?

uniq is a unix filter that outputs the provided input with adjacent duplicate lines removed. When uniq is invoked with the -c flag, it counts the number of occurrences for each line. Consider the following sample use of uniq -c:

myth$ cat samples/colors
red
green
green
yellow
blue
blue
blue
blue
myth$ uniq -c samples/colors
    1 red
    2 green
    1 yellow
    4 blue

How does myuniq operate?

The myuniq program you are to write is similar in operation to the standard uniq with a few differences. Here is what it does:

  • myuniq filters out all duplicate lines, no matter where they appear in the input (standard uniq only detects repeated lines when they are immediately adjacent to one another in the input)
  • Only one copy of each unique line is printed. The unique lines are printed in the same order they appeared in the input.
  • myuniq always prefixes each line of output with its count of occurrences in the input (this is the behavior of standard uniq when invoked with the -c flag)
  • myuniq supports no command-line flags (standard uniq has a number of flags)

In order to detect repeated lines anywhere in the input, myuniq must keep track of all of the lines it has seen as it moves through the input. It will also need to count the frequency for each line. An array of structs to store this information seems like just the tool for the job, but we have a conundrum. How many entries do we need for this array? Unlike mytail which knew in advance the exact number of entries to allocate, myuniq is in the dark. It might need an array of 10 entries or 10,0000. Any arbitrary fixed choice will be too large for some inputs and too small for others. Sounds like an opportunity for resize-as-you-go!

The strategy we require for myuniq is to allocate the array for 100 entries as the initial estimate, and if/when that fills up, resize the array to add another 100 entries. Then repeat this as needed, adding an additional 100 entries each time.

To match the output format of the sample solution, when printing out the line count number you should use the printf token %7d instead of just %d. This will print the number with width 7, meaning the output will look something like:

      1 red
   5000 blue
  12413 green
1253473 yellow

Most of the challenge in the myuniq program concerns proper memory handling. By the time you are done with this assignment, you will be a heap master!

Memory Allocation

This assignment provides a lot of targeted practice in managing memory. You'll be using both the stack and heap for various purposes and need to take care to appropriately allocate and properly deallocate each. Here are some general memory guidelines as well as a few recommendations specific to this program:

  • Stack allocation is as simple as declaring a variable. The stack is convenient (auto-allocation/deallocation), type-safe, and very efficient.
  • Dynamic allocation is via malloc/realloc/free. The heap gives the programmer control over lifetime (explicit allocation/deallocation) and flexibility (storage can be re-sized/re-purposed), but comes at a cost in safety and efficiency.
  • Take care to understand proper use of stack versus heap allocation and when each is appropriate. As a general rule of thumb, unless a situation requires dynamic allocation, stack allocation is preferred. Often both techniques are used together in a program. Heap allocation is a necessity when:
    • you have a very large allocation that could blow out the stack

    • you need to control the memory lifetime, or memory must persist outside of a function call

    • you need to resize memory after its initial allocation

It is essential that you are using Valgrind throughout. Check out our Valgrind guide, and/or revisit the Valgrind exercises in labs 2 and 3 to be sure you know how to put Valgrind to work for you. In particular, let Valgrind help you with:

  1. Memory errors. This is by far the most important reason to have Valgrind on the job. If Valgrind is reporting errors, they need your immediate attention.
  2. Memory leaks. If Valgrind reports leaks, you have some heap allocations without their matching deallocation. Given that leaks generally do not cause bugs, and incorrect deallocation can, we recommend that you do not worry about freeing memory until your program is finished. Then, you can go back and deallocate your memory as appropriate, ensuring correctness at each step.

Testing

In addition to the above programs, as part of this assignment you should also add at least 5 additional tests of your own in the custom_tests file that show thoughtful effort to develop comprehensive test coverage.

  • When you add a test, also document your work by including comments in the file that explain why you included each test and how the tests relate to one another.
  • The tests supplied with the default SanityCheck are a start that you should build on, with the goal of finding and fixing any bugs before submitting, just like how a professional developer is responsible for vetting any code through comprehensive testing before adding it to a team repository.
  • We recommend you run SanityCheck early and often (and remember, it even makes a snapshot of your code to guard against editing mishaps!).
  • Quick links from the Assignments dropdown: Information on SanityCheck on the working on assignments page, testing strategies.

Pipelines: If you'd like to include pipelines in your custom_tests, SanityCheck requires that your first command is one of the three executables ./mycat, ./mytail, or ./myuniq with a filename argument. You can then use pipe character with Unix filters (e.g., grep, sort, etc.) or the three assignment executables.

IMPORTANT NOTE ON CUSTOM FILES: by default, only files in the starter code are submitted when running the submit tool. If you create any additional files for your custom tests that you need to include in your submission, you must execute the following command to add it to the list of files that will be submitted: git add MY_FILE_NAME. For instance, if you create a custom text file for testing called myfile.txt that you need to submit, the command would be git add myfile.txt. Then, when you run the submit tool, this file will be submitted as well. We will ignore any files you do not end up referencing in your tests, and this only needs to be done once per file, ever.

Efficiency

We will be looking at the memory and runtime efficiency of your programs. For memory efficiency, Valgrind can be a huge help, as the Valgrind report includes stats on your total heap usage. Look for a line at the end of a Valgrind report in this format:

total heap usage: 10 allocs, 10 frees, 888 bytes allocated

This is the aggregate total of all heap allocations during the program's lifetime. The count of allocs should always equal the count of frees. The count of allocations and total size allocated is a measure of your program's memory "footprint".

Runtime efficiency (how fast your program runs) is also important. Prefix a command with time to execute and report on the time used to complete it. For example, time ./mytail will measure your program's runtime, and time samples/mytail_soln will measure the sample solution's runtime. The time labeled "user" is the one of interest.

    % time ./mytail -1 samples/dictionary
    zygote

    real    0m0.010s
    user    0m0.006s
    sys     0m0.000s
    % time samples/mytail_soln -1 samples/dictionary
    zygote

    real    0m0.009s
    user    0m0.007s
    sys     0m0.000s

The best strategy for ensuring appropriate memory and runtime efficiency is to measure the sample executable on a given input and compare to the measurement of your program running on the same input. If your performance is in the same ballpark as ours (say within a factor of 2 or 3), you're golden, but you should aim for closer to validate your usage of memory and time is correct. When considering tradeoffs of space versus time, the sample executable shows you the balance we are looking for. If your program uses noticeably more memory or time, it indicates you have some unnecessary or redundant work that should be eliminated. Be sure to note that very small inputs are, well, too small for meaningful measurement; instead also measure on heftier input to see the program operating at scale.

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.

We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!

Grading

Here is the tentative point breakdown:

Functionality (95 points)

  • Sanity cases (25 points) Correct results on the default sanity check tests.
  • Comprehensive/stress cases (40 points) Correct results for additional test cases with broad, comprehensive coverage and larger, more complex inputs.
  • Clean compile (2 points) Compiles cleanly with no warnings.
  • Clean runs under valgrind (15 points) Clean memory reports for all programs when run under valgrind. Memory errors (invalid read/write, use of freed memory, etc) are significant deductions. Memory leaks are a minor deduction. Every normal execution path is expected to run cleanly with no memory errors nor leaks reported. We will not test exceptional/error cases under Valgrind except for fgets encountering an error, which will be tested.
  • Reasonable efficiency (8 points) We expect programs to be reasonably efficient in use of time and space. Full points are awarded for being on par (2-3x) with the sample program; deductions will apply for immoderate use of memory/time. There is no bonus for outperforming the benchmark (and extreme efforts to do so could detract from your code quality).
  • custom_tests (5 points) Your custom_tests file should include at least 5 tests of your own that show thoughtful effort to develop comprehensive testing coverage. Please include comments that explain your choices. We will run your custom tests against your submission as well as review the cases to assess the strength of your testing efforts.

Code review (buckets together weighted to contribute ~20 points)

The grader's code review is scored into a bucket per assignment part to emphasize the qualitative features of the review over the quantitative. The styleguide is a great overall resource for good program style. Here are some highlights for this assignment:

  • Use of pointers and memory. We expect you to show proficiency in handling pointers/memory, as demonstrated by appropriate use of stack versus heap allocation, no unnecessary levels of indirection, correct use of pointee types and typecasts, const correctness, and so on.
  • Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. When you need the same code in more than one place, you should unify, not copy and paste. If the C library provides functionality needed for a task, you should leverage these library functions rather than re-implement that functionality.
  • Style and readability. We expect your code to be clean and readable. We will look for descriptive names, defined constants (not magic numbers!), and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
  • Documentation. You should document both the code you wrote and what we provided. We expect program overview and per-function comments that explain the overall design along with sparing use of inline comments to draw attention to noteworthy details or shed light on a dense or obscure passage. The audience for the comments is your C-savvy peer.
  • Codecheck compliance. We will be running your code through codecheck and verifying that there are no issues present.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have written your own implementation of two Unix filter commands and an improved version of a C I/O function, using pointers, arrays and dynamic memory. That's a pretty darn impressive accomplishment!

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.

  • The scan_token function you wrote for assignment 2 required that the client supply the memory to write the scanned token. The read_line function of assignment 3 instead allocates memory on the client's behalf. What are the advantages and disadvantages between the two approaches?
  • It is stated that a correct program should have a one-to-one correspondence between malloc calls and free calls. How does adding a realloc call change the needed number of malloc and free calls? Briefly explain your reasoning.
  • Which is the bigger problem: memory errors or memory leaks? Why? How can you tell which is which in a Valgrind report?
  • How can you gauge whether your program has acceptable performance in terms of runtime or memory efficiency?
  • When it is appropriate to use calloc instead of malloc?

Frequently Asked Questions

What is assert? How do I use it?

The assert macro from <assert.h> is a mechanism for fatal errors. You assert a condition that must be true to continue:

    assert(num < 1000);

If the condition fails, assert prints the error and halts the program. Liberal use of asserts is a defensive programming practice used by a programmer to communicate with themselves (or other programmers) about fatal situations that should "never happen" and from which there is no possibility of recovery. For this assignment, you should verify any allocation request was successful by asserting the result was non-NULL. A NULL result can mean that the heap is exhausted or the internal malloc data structures have been damaged. In either case, there is no point in continuing. The alternative of blundering on with a NULL pointer will eventually lead to a crash, but far removed from the original error. Better to assert right away!

One other point to make about assert is that they can be enabled/disabled by a compile-time setting. Asserts are often disabled when finalizing the production version of a program. As such, it's not a good idea to put an expression with side effects or an essential statement inside of an assert. Instead, the assert expression should merely be a test that you can do without. For example, consider these examples:

assert(ptr = malloc(size)); // if assert disabled, no malloc either!

ptr = malloc(size);
assert(ptr);                // if assert disabled, nothing critical skipped