Written by Julie Zelenski
Advice for assignment 3
Valgrind early, Valgrind often! Read our Valgrind guide, watch the Valgrind video, and/or revisit the Valgrind exercises in labs 2 and 3 to be sure you know how to put Valgrind to work for you.
Memory allocation
One of the challenges in this assignment concerns correct memory management. Systems programmers must be conscious and careful in their use of memory. Here are some general memory guidelines as well as a few recommendations specific to this program:
- Stack allocation is as simple as declaring a variable. The stack is convenient (auto-allocation/deallocation), type-safe, and very efficient.
- Dynamic allocation is via malloc/realloc/free. The heap gives the programmer control over lifetime (explicit allocation/deallocation) and flexibility (storage can be re-sized/re-purposed), but comes at a cost in safety and efficiency.
- Take care to understand proper use of stack versus heap allocation and when each is appropriate. As a general rule of thumb, unless a situation requires dynamic allocation, stack allocation is preferred. Often both techniques are used together in a program.
Common questions about assign3
How should read_line operate when there are no more lines to read?
Ideally, you would directly check for whether any characters remain to be read, but given the way files works in C/unix, the EOF condition cannot be checked in advance. It is the process of trying to read past the end that triggers EOF condition. The classic strategy is to assume the read will succeed and forge ahead with that plan. If it fails, you pivot to your EOF handling. The EOF case should clean up any initial allocation that turned out to not be needed after all and return NULL.
Sanity check is reporting a mismatch about my program outputting an extra blank line at the end of the mycat Makefile test. None of the other tests complain about this. What gives?
We configured this specific test to report discrepancies in how your read_line handles empty lines and/or EOF. If read_line is called when the next line consists of just a newline, it should return an empty string. If called where there are no more characters to read, it should return NULL. Returning empty string instead of NULL or vice versa will produce extra or missing blank lines in the output. The usual behavior of sanitycheck is to ignore whitespace when comparing the output. This specific test uses a strict comparison that only accepts an exact match so as to alert you that there is a problem with your read_line. (Hint: It may help to read the previous question about correctly handling EOF)
What is assert? How do I use it?
The assert macro from <assert.h> is a mechanism for fatal errors. You assert a condition that must be true to continue:
assert(num < 1000);
If the condition fails, assert prints the error and halts the program. Liberal use of asserts is a defensive programming practice used by a programmer to communicate with themselves (or other programmers) about fatal situations that should "never happen" and from which there is no possibility of recovery. For this assignment, you should verify any allocation request was successful by asserting the result was non-NULL. A NULL result can mean that heap is exhausted or the internal malloc data structures have been damaged, in either case, there is no point in continuing. The alternative of blundering on with a NULL pointer will eventually lead to a crash, but far removed from the original error. Better to assert right away!
One other point to make about assert is that they can be enabled/disabled by a compile-time setting. Asserts are often disabled when finalizing the production version of a program. As such, it's not a good idea to put an expression with side effects or an essential statement inside of an assert, instead the assert expression should merely be a test that you can do without. For example, consider these examples:
assert(ptr = malloc(size)); // if assert disabled, no malloc either!
ptr = malloc(size);
assert(ptr); // if assert disabled, nothing critical skipped
How can I determine if my program meets the runtime and memory efficiency benchmarks?
Measure the sample executable on a given input and compare to the measurement of your program running on the same input. If your performance is in the same ballpark as ours (say within a factor of 2 or 3), you're golden. When considering tradeoffs of space versus time, the sample executable shows you the balance we are looking for. If your program uses noticeably more memory or time, it indicate you have some unnecessary or redundant work that should be eliminated. Be sure to note that very small inputs are, well, too small for meaningful measurement, instead measure on heftier input to see the program operating on scale.
How do I measure the memory use of a program?
Run your program under Valgrind. Look for the line labeled "total heap usage" to see the count of allocations and the number of bytes allocated. You can also run our sample solution under Valgrind to spy on its memory usage.
% valgrind ./myuniq samples/colors
...
total heap usage: 14 allocs, 14 frees, 6,104 bytes allocated
% valgrind samples/myuniq_soln samples/colors
...
total heap usage: 12 allocs, 12 frees, 7,528 bytes allocated
How do I measure the runtime of a program?
Prefix a command with time to execute and report on the time used to complete it. For example, time ./mytail will measure your program's runtime, time samples/mytail_soln will measure the sample solution's runtime. The time labeled "user" is the one of interest.
% time ./mytail -1 samples/dictionary
zygote
real 0m0.010s
user 0m0.006s
sys 0m0.000s
% time samples/mytail_soln -1 samples/dictionary
zygote
real 0m0.009s
user 0m0.007s
sys 0m0.000s
Can you remind me how structs work in C?
Here is a quick refresher. You can find more details in your C reference.
C struct declarations are almost, but not exactly, the same as C++. In C, the following declares a new struct type:
struct coord {
int x, y;
};
The name of the type is struct coord and you cannot drop the struct keyword; declaring a variable of type coord will not compile.
You can declare struct variables on the stack or allocate in heap. Note that sizeof works correctly when applied to a struct type.
struct coord origin; // struct on stack
struct coord *p = malloc(sizeof(struct coord)); // struct in heap
The . (dot) operator is used to access the fields within a struct.
origin.x = 100;
origin.y = 200;
If you have a pointer to a struct, your first attempt to access the fields is likely to run afoul of the fact that . has higher precedence than *. You can add parentheses to force the desired precedence, or better, use the -> operator which combines . and * for this common need.
// these next 3 lines attempt to access field via struct pointer
*p.x = 0; // WRONG! precedence applies . first then *
(*p).x = 0; // OK: parens used to override precedence
p->x = 0; // BEST: preferred way to access
How do I use gdb/valgrind on my program running in a pipeline?
A simple approach is to instead write a file with the input you want to test and then gdb/valgrind your program running on that file. To prepare the input file, you can use redirection > to capture the output of the command.
Filters and pipelines are cool! Where can I learn more about them?
Chris's unixref has you covered-- check out the pipeline topic/video!