Written by Julie Zelenski
Design advice
- Use the standard C library, don't re-implement. The functions in the standard libraries are already written, debugged, and optimized--- what more could you ask for? Study up on what's available and be ready to make good use of them.
- Unify, don't duplicate. Common code should be unified. Sharing code requires effort in upfront design, but pays off by giving you less code to write/test/debug and produces a more readable and malleable implementation. Reusing code via copy and paste does not count as sharing. :-)
- Mix-and-match mysort options. The 4 command-line flags can each be turned on independently, making a total of 24 combinations and if you aren't careful about it, you can end up making a distinct case out of each combination and debugging 16x more code than you needed to. Instead, take care to unify all the tasks that are in common, making only the tiniest of detours where needed to add code to handle their differences. You may have iterate a little to work out the details, but you can arrive at a very pleasing end result if you put your mind to it!
- Pro-tip about reverse sort. The easier approach to handle the reverse option for
mysortis to sort the input as you would normally, and just change the order you print the results. I must admit I did not think of this myself when I first tried to implement it, but a much smarter colleague enlightened me. Who says old dogs can't learn new tricks?
Wise use of memory/pointers
- Stack versus heap. Take care to understand proper use of stack versus heap allocation and when each is appropriate. As a general rule of thumb, unless a situation requires dynamic allocation, stack allocation is preferred. Often both techniques are used together in a program. For example, it is desirable that a dynamically-allocated C-strings be of the proper size, e.g. the string "apple" needs 6 characters worth of storage, no more, no less. A useful idiom to learn is allocating a maximum-sized stack buffer to temporarily store a string of indeterminate length, which is later copied to a dynamically-allocated buffer of the correct size once the length is known. This is how the line-reading code for
mysortshould operate. - Dynamic resizing. The
mysortprogram doesn't know the count of lines in the input in advance, so it allocates based on an estimate and grows as needed. Doubling-in-size is a classic technique you'll see again and again such as in the buffer-handling for last assignment'sread_lineand the entries gathered bymusl_scandir. - Assert allocation requests. It's not likely you will exhaust the entire heap on a modern system, but NULL can also be returned when the heap is corrupted. Either way,there is nothing good that comes after a failed allocation, so best to just call a halt to things before it goes from bad to worse.
- Pointee types should communicate the truth. If you know the type of a pointer, by all means, declare it with its specific type. Use
void*only if the pointee type is not known or can vary. Conversely, don't declare what should be avoid*pointer with a specific pointee type. Declaring avoid*as achar*misleads the reader into thinking this pointer is a C-string and allows it to be mis-used as such with no compiler warning. - Easy on the pointer arithmetic. Having been exposed to pointer arithmetic, some students go a little crazy and start using it for everything. Even though you can access ordinary array elements or struct fields by calculating an offset from the base address, you should use array subscripts and struct field names for readability and safety. Work within the type system whenever possible and only drop down to low-level typecasts and pointer arithmetic when raw memory is all you've got.
-
Prefer assignment to raw memcpy. Similarly, although
memcpycan be used to do any kind of assignment, that doesn't mean it always should be used. Consider the three following ways of assigning an integer:int dst, src; void *pDst = &dst, *pSrc = &src; dst = src; // option A *(int *)pDst = *(int *)pSrc; // option B memcpy(pDst, pSrc, sizeof(int)); // option CAll three options accomplish the same thing (copying a 4-byte integer from src to dst). Option A, the ordinary assignment statement is direct, clean, and safe. This is always your best option as long as you are able to work within the type system. If you are forced to operate through a
void*, you must instead consider options B and C. Option B, assignment using a typecast, is your second best choice. This option is readable, efficient, and relatively safe (given the correct typecast). Option C operates at the lowest-level, with no type safety whatsoever. A mismatch in the pointer types or size is silently accepted, yet causes havoc. When trying to read/write contents of avoid*, go with Option B if possible. Drop down to the low-level memory copying of Option C only when you have no other choice. As a rule of thumb: if the size argument to memcpy is a compile-time constant (and especially so if it uses sizeof), this indicates you shouldn't be using memcpy/memmove. You only need to invoke raw byte-copying when the type/amount of memory being copied is determined at runtime. -
Be wary of typecasts. Typecasts are not to be thrown about indiscriminately. Your typecast defeats the type system and coerces the compiler to go along with it without complaint. With this power comes great responsibility. Before introducing one, consider: why is this cast necessary? what happens if it is not present? is there a safer way to communicate my intention? how might I work within the type system rather than against it? (i.e. correcting a mismatch in types rather than covering up with a cast). You should never cast away const-ness (fix the types instead!) and casting a function pointer is an extremely sketchy move (fix the function prototype instead!). You also never need to cast anything to a
void*, since it is the universal recipient and is compatible with any pointer type (such as cast is redundant and can be removed). You will need a few typecasts, but add them exactly and only where you must. If your code repeatedly uses an expression that contains a necessarily unavoidable typecast, consider placing that expression into a shared helper to be invoked everywhere rather than repeat code.
Common questions about assign4
I don't know how to properly call scandir. Help!
Remember, the code to musl_scandir is given in the assignment writeup if you want to review its implementation to remind yourself of how the function uses its various parameters.
The man page for scandir also contains a short example program. Minor heads up: that example program happens to print the entries in reverse order which may not be what you want/expect. Be wary of dropping in code from the documentation without reviewing it first to know what you are getting and whether you will need to adapt it.
How do I declare a parameter/variable of function pointer type?
The syntax for this is a bit goopy. The name of the variable is buried in the middle of the declaration and you have to read from inside out. The declaration below says the variable c is a pointer to a function that takes two const void* parameters and returns an int.
int (*c)(const void *, const void *) = my_compare;
Making a typedef for the function pointer type is a nice way to clean this up. The typedef below establishes cmpfn_t as a nickname for int (*)(const void *, const void*). This typedef means you can now use cmpfn_t as the type for a variable, parameter, return type, in place of the longhand form.
typedef int (*cmpfn_t)(const void *, const void *);
cmpfn_t c = my_compare;
See your C reference or web search for more info on typedef. A neat online resource for unraveling complex C declarations is cdecl.org.
How do I resolve the compiler warning pointer of type void * used in arithmetic?
The pointer arithmetic expression pointer + n implicitly scales n by the size of the pointee before adding it to pointer. If pointer is a void*, its pointee size is unknown, and the compiler will squawk because it cannot apply the proper scaling. To perform pointer arithmetic on a void* you must first cast it to indicate the desired scaling. The typical approach is to cast to char* to get no scaling and then manually multiply yourself.
What is the difference between memcpy and memmove?
Both are optimized routines for copying a region of raw bytes. memcpy is expected to be somewhat efficient that memmove but requires that the source and destination regions are distinct, whereas memmove correctly handles if the regions overlap. Either can be used to write an entire sequential block of a memory in one call, which can be much more efficient that repeated calls to process the memory in smaller pieces.
My custom sanitycheck tests for mysort report mismatch due to re-ordering of equal lines. Is this a problem?
There is no required tie-breaker for equal lines, the spec allows mysort to print them in any order. But sanity check is looking for an exact match so you should construct your custom tests so as to not trip on spurious discrepancies. In grading, our tests will accept any valid reordering of equal lines.