Assignment 4: Into the void*

Due: Wed May 11 11:59 pm
Late submissions accepted until Fri May 13 11:59 pm

Assignment by Julie Zelenski and Michael Chang, with modifications by Nick Troccoli, Katie Creel, Brynne Hurst and Jonathan Kula

Learning Goals

This assignment focuses on generics with void *, function pointers, generic memory handling and disclosure and partiality. You will be building your skills with:

the purpose and use of function pointers in C
using generic void* interfaces as a client
implementing a generic void* function using raw memory operations
understanding disclosure processes and the role of partiality in security

Overview

In this assignment, you will implement simplified versions of the unix commands ls and sort. There are four parts to the assignment:

implementing the myls program that prints out a list of directory entries in a given location on the filesystem
reading a case study about a core Internet vulnerability to explore disclosure processes and partiality
implementing binsert, a generic function that can find/insert an element into a sorted array of any type.
implementing the mysort program that prints out the lines in a file in sorted order, with options for different sort orders and outputs, using your binsert function

A few reminders:

The working on assignments page contains info about the assignment process.
The collaboration policy page outlines permitted assignment collaboration, emphasizing that you are to do your own independent thinking, design, coding, and debugging. If you are having trouble completing the assignment on your own, please reach out to the course staff; we are here to help!

To get started on the assignment, clone the starter project using the command

git clone /afs/ir/class/cs107/repos/assign4/$USER assign4

The starter project contains the following:

readme.txt: a text file where you will answer questions for the assignment
util.c mysort.c, myls.c and Makefile: three files that you modify, and their Makefile for compiling
custom_tests: the file where you should add custom tests for your programs
test_binsert.c: a program that calls your binsert function in util.c for testing purposes. You do not need to comment or modify this file.
samples: a symbolic link to the shared directory for this assignment. It contains:
- SANITY.ini, sanity.py and prototypes.h: files to configure and run Sanity Check. You can ignore these.
- myls_soln, mysort_soln and test_binsert_soln: executable solutions for the code you will implement.
- colors, numbers and names: sample text files for testing your unix commands
tools: contains symbolic links to the sanitycheck, submit, and codecheck programs for checking, testing and submitting your work.

Codecheck: Part of your style / code review score will be dependent on having no code issues when run through the codecheck tool. Make sure to run it to ensure your code adheres to all necessary guidelines!

Note: an important restriction on code you write for this this assignment is you are not allowed to use the versionsort or alphasort functions that are part of the C library. You should implement your own comparison functions. A solution that violates this restriction will receive a significant deduction, so please verify your approach is in compliance!

Testing and Debugging

This assignment heavily emphasizes testing and debugging. Make sure to *develop incrementally* and use good debugging practices. Incremental development means implementing small pieces in isolation, thoroughly testing them, and then building on them only once you know they are correct. This ensures a working implementation at each step, and if you encounter a later bug, you can be confident it is a result of your most recent changes. If you encounter bugs, follow the steps in our debugging guide checklist!

View Debugging Guide

Good testing can help you uncover bugs that you can then track down and fix. For each of myls, binsert and mysort, you should add at least 3 to 5 additional tests of your own (total of at least 10) in the custom_tests file that show thoughtful effort to develop comprehensive test coverage. When you add a test, also document your work by including comments in the custom_tests file that explain why you included each test and how the tests relate to one another. The tests supplied with the default SanityCheck are a start that you should build on, with the goal of finding and fixing any bugs before submitting, just like how a professional developer is responsible for vetting any code through comprehensive testing before adding it to a team repository. You can find suggested testing strategies on the testing page, linked to from the Assignments dropdown. We strongly encourage you to write some tests before you write any code, and then add additional tests as you write code.

We also encourage you to make your own text files for testing! Having custom file inputs can help you test more thoroughly. When you submit your assignment, you'll be asked whether you'd like to add any custom files to your submission, because by default only the starter files are submitted. At that time, you can specify any additional test files of your own that you created.

Working with Memory and Pointers

This assignment focuses heavily on pointers and manipulating memory. Here are some tips to keep in mind as you work through the assignment:

Stack versus heap. Take care to understand proper use of stack versus heap allocation and when each is appropriate. As a general rule of thumb, unless a situation requires dynamic allocation, stack allocation is preferred. Often both techniques are used together in a program.
Assert allocation requests. Remember to call assert after each allocation request to immediately catch any heap corruption or other errors and more easily fix them.
Pointee types should communicate the truth. If you know the type of a pointer, declare it with its specific type. Use void* only if the pointee type is not known or can vary. Conversely, don't declare what should be a void* pointer with a specific pointee type, which can mislead the reader and allow it to be mis-used with no compiler warning.
Use pointer arithmetic sparingly. Even though you can access ordinary array elements or struct fields by calculating an offset from the base address, you should use array subscripts ([]) and struct field names for readability and safety. Work within the type system whenever possible and only drop down to low-level typecasts and pointer arithmetic when necessary.
Prefer assignment to raw memcpy. Similarly, although memcpy can be used to do any kind of assignment, that doesn't mean it always should be used. Consider the three following ways of assigning an integer:

int dst, src;
void *pDst = &dst;
void *pSrc = &src;

dst = src;                          // option A
*(int *)pDst = *(int *)pSrc;        // option B
memcpy(pDst, pSrc, sizeof(int));    // option C

All three options accomplish the same thing (copying a 4-byte integer from src to dst). Option A, the ordinary assignment statement is direct, clean, and safe. This is always your best option as long as you are able to work within the type system. If you are forced to operate through a void*, you must instead consider options B and C. Option B, assignment using a typecast, is your second best choice. This option is readable, efficient, and relatively safe (given the correct typecast). Option C operates at the lowest-level, with no type safety whatsoever. A mismatch in the pointer types or size is silently accepted, yet causes havoc. When trying to read/write contents of a void*, go with Option B if possible. Drop down to the low-level memory copying of Option C only when you have no other choice. As a rule of thumb: if the size argument to memcpy is a compile-time constant (and especially so if it uses sizeof), this indicates you shouldn't be using memcpy/memmove. You only need to invoke raw byte-copying when the type/amount of memory being copied is determined at runtime.

Be wary of typecasts. Use typecasting sparingly. Your typecast defeats the type system and coerces the compiler to go along with it without complaint. With this power comes great responsibility. Before introducing one, consider: why is this cast necessary? what happens if it is not present? is there a safer way to communicate my intention? (i.e. correcting a mismatch in types rather than covering up with a cast). You should never cast away const-ness (fix the types instead!) and casting a function pointer is an extremely sketchy move (fix the function prototype instead!). You also never need to cast anything to a void*, since it is the universal recipient and is compatible with any pointer type (such a cast is redundant and can be removed). You will need a few typecasts, but add them exactly and only where you must.

1. `myls`

Your first task is to implement the myls program, which is a simplified version of the Unix ls command. It takes in zero or more arguments, which are paths to directories, and lists the directory entries from each path, one per line. If invoked with no arguments, it prints the entries from the current directory. Try out the provided sample solution, e.g. ./samples/myls_soln or ./samples/myls_soln samples:

Expand Sample Run #1

myth$ ./samples/myls_soln
Makefile
custom_tests
myls.c
mysort.c
readme.txt
samples
test_binsert.c
tools/
util.c

Note that myls adds a trailing slash when printing the name of an entry that is itself a directory.

myls supports two flags, -a and -z, which can be used alone or in combination. If you use the -a flag, myls also includes entries whose names start with . (without -a, these are ignored):

Expand Sample Run #2

myth$ ./samples/myls_soln samples -a
./
../
SANITY.ini
colors
myls_soln
mysort_soln
names
numbers
prototypes.h
sanity.py
test_binsert_soln

If you use the -z flag, myls outputs directories followed by non-directories, and within each of those 2 groups the entries are sorted alphabetically (without -z, all entries are just sorted alphabetically):

Expand Sample Run #3

myth$ ./samples/myls_soln -z
tools/
Makefile
custom_tests
myls.c
mysort.c
readme.txt
samples
test_binsert.c
util.c

myls only supports directory paths as arguments. If an argument refers to a file or non-existent path, or if another error occurred when getting the list of directory entries, it should print an error message and skip that argument. You should match the error message exactly to the solution using the error function with the text "cannot access _____", filling in the name of the invalid argument, along with an error code of 0 and a status code of 0 (so the program still continues running).

Review/Comment Starter Code

myls.c comes with provided code that fully implements parsing command line arguments and calling the ls function that you must implement. There are also comments about enum and typedef, two C constructs that come up on this assignment. Study the provided program code to understand its behavior and work out how you will implement the ls function and add any other functions as necessary. In particular, processing command-line arguments is crufty work, and the starter code uses the GNU extension getopt to help. Use man 3 getopt to learn more about how it works - for example, as a thought question, how is an invalid option detected/reported when using getopt? Next, add comments to the provided code to document your strategy, including an appropriate overview comment for the entire program, documenting each of the functions, and adding inline comments where necessary to highlight any particularly tricky details.

Implementation

To implement myls, you'll call the library function scandir to get a list of information about the directory entries from the specified path, and print out that information. Depending on the command line arguments, you may need to omit certain directory entries and sort them in different ways. scandir uses function pointers to provide this functionality. Make sure to not re-implement any functionality that scandir can provide for you! scandir will do most of the heavy lifting for this program - your job is to call it appropriately based on what directory entries you want and in what order, and print them out.

REMEMBER: no using alphasort or versionsort!

When relying on a new library function, it's critical to take the time to fully understand how to use it before adding it to your program. Before working on myls, thoroughly read through the information below to understand what scandir does and how to use it.

`scandir` Information

scandir is used to create a directory listing (an array where each element contains information about a file/folder in a directory) for a given path. It uses two function pointers: a filter function which controls which entries to include in the list, and a comparison function which controls the order of entries in the list. Here are the steps you should take to understand how it works:

Read the man page (man scandir) along with our questions below to guide you through it
Try writing some code in myls.c to play around with scandir and become familiar with how to use it
Examine the sample implementation further below along with our questions to guide you through it

Step 1: open the man page (man scandir) and read the information provided. Scroll to the bottom to see a code sample that uses scandir - note that while this code sample is useful, it doesn't have the best style and uses some functions we don't need, such as perror and exit, and uses alphasort, which you are not allowed to use. Work through the following notes (no need to submit anything), which will be critical to effectively using the function in myls. While or after doing this, as step 2, we strongly encourage you to try writing some test code in myls.c to play around with scandir and verify your understanding!

scandir creates an array of struct dirent *s for the client. Review man readdir to see the definition of struct dirent. An oddity to note is that the last struct field is an array of characters that is described as of "unspecified size". The d_name portion will be custom-sized to fit a particular entry's filename including a null terminator.
Q: How does scandir give the array it creates back to the client? Hint: that triple-star namelist parameter is pretty scary, but think back to the pattern we discussed in class of passing a pointer to something if the caller wants to allow it to be modified in a function).
Q: What does scandir return on success, and on error?
Q: How does a client indicate that they want no filtering applied to the directory listing? No sorting?
Q: Does a non-zero result from the filter callback function filter cause an entry to be kept or discarded from the list?
Q: What heap memory must be freed by the client? What are the steps for the client to properly free that memory?

Step 3: Though usually reading the manual page is enough to understand client usage of a function, for our purposes we will dig deeper and use our growing C skills to understand some of its implementation. This can help us even better understand how to use it and see many core 107 concepts (pointers, heap, function pointers, and more) applied in a library function.

A scandir implementation (based on code from musl) is shown below, with added comments to help navigate. This function bares the inner soul of C (function pointers, dynamic allocation, triple star, oh my!) but your last three weeks of immersion into deep C waters has made you ready for this. We have picked out questions below that are key to your understanding of the function as a whole and how to use it. Go slow, draw pictures, and ask questions about anything you don't understand:

Expand Implementation Code

 1  int musl_scandir(const char *dirp, struct dirent ***namelist,
 2     int (*filter)(const struct dirent *),
 3     int (*compar)(const struct dirent **, const struct dirent **)) {
 4
 5      DIR *directory = opendir(dirp);
 6      if (directory == NULL) {
 7          return -1;
 8      }
 9
10      // Initialize an array of dirent pointers to hold the entries
11      struct dirent **dir_entries = NULL;
12      size_t dir_entries_used = 0;
13      size_t dir_entries_capacity = 0;
14
15      struct dirent *curr_entry_ptr;
16      while ((curr_entry_ptr = readdir(directory))) {
17
18          // If we should filter out this entry, skip it
19          if (filter && !filter(curr_entry_ptr)) {
20              continue;
21          }
22
23          // If we have run out of space, resize our array
24          if (dir_entries_used >= dir_entries_capacity) {
25              dir_entries_capacity = 2 * dir_entries_capacity + 1;
26
27              // If the new bytes needed > SIZE_MAX, it's invalid
28              if (dir_entries_capacity > SIZE_MAX / sizeof(*dir_entries)) {
29                  break;
30              }
31
32              struct dirent **tmp = realloc(dir_entries,
33                  dir_entries_capacity * sizeof(*dir_entries));
34              if (!tmp) {
35                  break;
36              }
37              dir_entries = tmp;
38          }
39
40          // Make heap space for this entry, and add it to the array
41          dir_entries[dir_entries_used] = malloc(curr_entry_ptr->d_reclen);
42          if (!dir_entries[dir_entries_used]) {
43              break;
44          }
45          memcpy(dir_entries[dir_entries_used++], curr_entry_ptr,
46              curr_entry_ptr->d_reclen);
47      }
48
49      closedir(directory);
50
51      // If an error occurred (errno is a global), free memory
52      if (errno) {
53          if (dir_entries) {
54              while (dir_entries_used-- > 0) {
55                  free(dir_entries[dir_entries_used]);
56              }
57          }
58          free(dir_entries);
59          return -1;
60      }
61
62      // If specified, sort the entries based on `compar`
63      if (compar) {
64          qsort(dir_entries, dir_entries_used, sizeof(*dir_entries),
65              (int (*)(const void *, const void *))compar);
66      }
67
68      *namelist = dir_entries;
69      return dir_entries_used;
70  }

Review the code above and consider the issues we highlight below. You do not need to submit answers to these questions:

Q: Revisit your understanding of how scandir gives the array back to the client and check out line 68, where that parameter is used. What is the code doing there? How can that help us better understand that parameter?
The number of directory entries is not known in advance, so scandir allocates the array using the classic "resize-as-you-go" allocation strategy. It varies slightly from what we have seen, but you can see similarities in the code from lines 24 to 38.
Q: Revisit your understanding of what heap memory must be freed by the client and check out lines 32 and 41, which allocate memory on the heap. What memory must the client free? Does the client have to free this memory if scandir encounters an error?
Lines 41-46 make a heap copy of a struct dirent. Because of the variable-sized d_name field (see first issue above), it allocates and copies a custom size.
Q: Line 64/65 is a little hard to parse, but it is applying a typecast to the function pointer being passed to qsort. Casting a function pointer is a sketchy thing to do, and it should generally be avoided, but why might it be necessary here? Pay close attention to what type is being sorted, what qsort will pass to the comparison function, and what will be passed to scandir's comparison function.
Q: The scandir filter function receives its argument as a const struct dirent *; the comparison function receives its arguments as const struct dirent **. Why the inconsistency?

Forging your way through this is a significant accomplishment. There is a lot going on in this code and you should feel crazy-proud of being able to dissect such a dense passage! Armed with that understanding, you're now ready to move onto implementing myls.

Testing

This program can be tested with sanity check. We encourage you to test with locations in the assignment project (e.g. main assignment folder or samples/), on the whole system (e.g. /usr/class/cs107), or make your own directories/files in your assignment folder for testing. Don't forget to add them to your submission!

2. Disclosure and Partiality Case Study

In lecture, we learned about the importance of safe, fair and just disclosure processes and the role that partiality plays in computer security. Please read the following obituary of security researcher Dan Kaminsky (Stanford login required) and answer the questions below in your readme.txt file about his security work and your own beliefs about disclosure and partiality.

View Article

[1-3 sentences] What disclosure process did Dan Kaminsky follow in response to finding the DNS vulnerability? Given the context, what kind of partiality did he demonstrate in doing so?
[1 paragraph] In lecture, we discussed the EternalBlue incident and the Vulnerabilities Equities Process. In this case, the US showed preference towards using vulnerabilities in the interests of its own intelligence gathering capabilities or its own citizens. What degree of partiality do you think is appropriate for a government in deciding whether to disclose or stockpile vulnerabilities? (We will not be grading you based on which degree you chose. Your answer should rely on critical reflection of the four degrees and include clear reasoning of your position.)
[1 paragraph] Which of the four degrees of partiality most guides your conduct and why? (We will not be grading you based on which degree you chose. Your answer should rely on critical reflection of the four degrees and include clear reasoning of your position.)

3. `binsert`

Your next task is to implement the function binsert in util.c, with the following signature:

void *binsert(const void *key, void *base, size_t *p_nelem, 
    size_t width, int (*compar)(const void *, const void *))

binsert is an augmented version of bsearch, a built-in generic library function. bsearch can search for a given element in a sorted array using binary search, and return a pointer to it. If the element isn't in the array, bsearch returns NULL; but binsert will insert the element into the correct position and return a pointer to it. This is similar to the lfind and lsearch library functions, whiwch provide two variants of a generic linear search - lsearch will add the element if not found, while lfind will not. Having the insertion is a handy convenience!

Sometimes, when we wish to build on the behavior of an existing function, we can simply call it in our own function. However, here that would result in poor performance and redundancy; if we call bsearch and it didn't find the element, we would have to then write our own code to insert it, requiring us to do much of the work bsearch just did all over again to find where the element should go. For this reason, we will instead take an implementation of bsearch, understand its behavior, and use its code as a starting point to implement binsert.

`bsearch` Information

Here is Apple's implementation of the bsearch library function. Work through the code below, and draw pictures as you go to understand its behavior. If you did not get to fully work through the lab4 exercises on void*, now is a good time to shore up that knowledge before moving forward, as it will be essential for understanding bsearch! In particular, make sure to refresh your memory on the bsearch bug you worked through.

Expand Implementation Code

 1    void *apple_bsearch(const void *key, const void *base,
 2                        size_t nmemb, size_t width,
 3                        int (*compar)(const void *, const void *)) {
 4        for (size_t nremain = nmemb; nremain != 0; nremain >>= 1) {
 5            void *p = (char *)base + (nremain >> 1) * width;
 6            int sign = compar(key, p);
 7            if (sign == 0) {
 8                return p;
 9            }
10            if (sign > 0) {  /* key > p: move right */
11                base = (char *)p + width;
12                nremain--;
13            }       /* else move left */
14        }
15        return NULL;
16    }

Consider the following questions to verify your understanding (do not submit answers):

This implementation uses binary search to efficiently find the element if it's present. Need a refresher on the binary search algorithm? Check out this link.
The comment /* else move left */ appears to apply to an empty statement. How does the search move left when sign is negative?
Sketch out a diagram as you work through how the function works for the following 2 examples:
- an integer array [1, 4, 5, 6, 7] searching for element 3.
- an integer array [-5, 2, 3] searching for element 3.

Forging your way through this is a big accomplishment. There is a lot going on in this code and you should feel proud that you can understand library code like this to implement a generic binary search! Armed with that understanding, you're ready to move onto implementing binsert.

`binsert` Implementation

binsert has the same arguments and requirements for them as bsearch, with one difference: the number of elements is passed as a pointer, which allows binsert to update the count when inserting a new element. Don't forget to update the count!
If binsert does not find a matching element, it inserts the key into the array at the proper position and increments the number of elements. It is the client's responsibility to ensure the array has sufficient space to accommodate a new element. binsert does no allocation, reallocation, or deallocation of the client's array memory (in other words, no calls to malloc/realloc/free/etc.)
The function returns a pointer to the matching array member or to the newly added member if no existing match was found.
Your solution must use the bsearch code as a starting point to write binsert, modified to fit the requirements for this problem. As a hint, you may not need to make significant modifications to the provided code. Our solution adds/modifies less than 10 lines.
If you need to move elements in the array to make room, instead of a loop to move array items one by one you should use memmove, which can scoot the whole array over in one call!

Testing

This function can be tested in isolation with the provided test_binsert.c program, which you do not need to modify or comment, but whose code you should read over to understand how it calls your binsert function. You can also write sanitycheck custom tests with test_binsert.

test_binsert takes command line arguments as input and prints them out in sorted order using your binsert function. The first argument must be either -i (for integers) or -s (for strings) corresponding to the input you specify; test_binsert will sort integers in ascending order, and strings alphabetically. You then follow that flag with the input you would like sorted. Here are some examples:

myth$ samples/test_binsert_soln -i 5 4 3 2 1
 1 2 3 4 5
myth$ samples/test_binsert_soln -s hi howdy hello
 hello hi howdy

Note that even though test_binsert only tests with strings or integers, your binsert function should be generic and work with an array of any type.

Run valgrind early and often to detect any lurking memory errors!

Before moving on: make sure you have thoroughly tested your binsert function, covering various different cases of possible inputs, and that you have written your custom tests. You will use this function later in the assignment, so it's vital to ensure it's thoroughly tested before moving on!

4. `mysort`

Your final task is to use your binsert function to implement the mysort.c program, which is a simplified version of the Unix sort command. It takes in the name of a file and prints its lines in sorted order. Try out the provided sample solution, e.g. ./samples/mysort_soln samples/colors:

Expand Sample Run #1

myth$ ./samples/mysort_soln samples/colors
blue
blue
blue
green
green
red
red
red

If you don't specify a filename, it will expect input from the terminal, and once you enter ctl-d (to signal end of input), it will print the contents you entered in the same way - printing the lines in sorted order.

By default, mysort prints the lines in alphabetical order. It also supports a few command-line flags to alter what is printed:

If invoked with the -l flag, the sort order is by increasing line length.
If invoked with the -n flag, the sort order is by string numerical value (applies atoi to each line and compares numbers).
There is no tie-breaker. Duplicate lines, i.e. lines that compare as equal according to sort order, can be output in any order.
If invoked with the -r flag, the sort order for mysort is reversed. (Hint: sort the input as you would normally, and just change the order you print the results)
If invoked with the -u flag, mysort discards duplicate lines. The sorted output contains one instance of each unique line from the input. The sort order determines which lines are duplicates (e.g. if sorting by length, two lines of same length are considered duplicates).
The flags to mysort may be used alone or in combination. If both -l and -n are used, whichever flag is last on the command line "wins" as sort order.
mysort supports no command-line flags other than -l -n -r -u.

For example, here's how we would sort the samples/colors file in decreasing order of line length:

Expand Sample Run #2

myth$ ./samples/mysort_soln -l -r samples/colors
green
green
blue
blue
blue
red
red
red

Try out different combinations of the command line flags with the sample solution to ensure you understand the behavior of each!

Review/Comment Starter Code

mysort.c comes with provided code that fully implements parsing command line arguments and calling the sort_lines function that you must implement (you will also implement the specified comparison function as part of this). Study the provided program code to understand its behavior and work out how you will add to it. In particular, processing command-line arguments is crufty work, and the starter code uses the GNU extension getopt to help. Use man 3 getopt to learn more about how it works - for example, as a thought question, how is an invalid option detected/reported when using getopt? Then, add comments to the provided code to document your strategy, including an appropriate overview comment for the entire program, documenting each of the functions, and adding inline comments where necessary to highlight any particularly tricky details.

Implementation

mysort should be implemented by building up an array of lines that you read one at a time using fgets, and then printing them out.

Because we do not know in advance how many lines are in the file and thus how long our array of lines should be, you should use the common "resize-as-you-go" approach: you should allocate the array on the heap with space initially for MIN_NLINES lines (a provided constant), and if/when that fills up, double the size. Repeat this process as needed, doubling the size each time, similar to what you did on assign3.
When you read a single line from the file, use fgets and read into a stack array of size MAX_LINE_LEN (you may assume all lines will fit in this size, including the null terminator). For simplicity, you may also assume that each line will end with a final newline character.
If the user did not use the -u flag, after you read in a line from fgets, you should make a heap-allocated copy of it using strdup and insert it in into your array (do not call binsert). The lines will not be in sorted order, and that's fine - once you have read in all lines, then you should call qsort once to sort the array according to the specified ordering.
If the user did use the -u flag, you should call binsert each time you read in a new line to insert unique lines in sorted order into your array of lines (do not store duplicates, and do not call qsort). However, you should not heap-allocate the line with strdup unless you know it is unique and will be stored in your array of lines. This means you should not heap-allocate the line before calling binsert. Think carefully about how you can know whether the line is unique and how to ultimately store a heap-allocated copy! (If you have significantly higher memory usage than the solution, issues with this are likely the reason why.)
REMEMBER: no using alphasort or versionsort!
Note that this approach for reading lines of unknown size is a different (but equally-valid) approach than assign3: before, we read directly onto the heap (no stack use), but potentially overallocated when we resized. Here, we are reading onto the stack first, but may heap-allocate it later, and we can allocate just the heap space we need.
Do your best to unify your code as much as possible; one of the key goals for mysort is to cleanly handle the mix of sorting options without repeating code or overly complicating things. In particular, the 4 command-line flags can each be turned on independently, making a total of 2⁴ combinations and if you aren't careful about it, you can end up making a distinct case out of each combination and debugging 16x more code than you needed to. Instead, take care to unify all the tasks that are in common, making only the tiniest of detours where needed to add code to handle their differences. You may have to iterate a little to work out the details, but you can arrive at a very pleasing end result if you put your mind to it!

Testing

This program can be tested with sanity check. We encourage you to use the provided test files in the samples/ folder and also to create your own test text files to use for testing and debugging. One note for custom tests is that while there is no required tie-breaker for equal lines (you can print them in any order), custom tests looks for an exact match, which may cause a custom test to fail even though the output is technically correct. Avoid including test cases like this in custom tests for this reason. In grading, our tests will accept any valid reordering of equal lines.

Submitting

Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. When you submit, you may optionally indicate that you do not plan to make a submission after the on-time deadline. This allows the staff to start grading some submissions as soon as the on-time deadline passes, rather than waiting until after the late period to start grading. You will also have an opportunity to add any custom files to your submission that you created (e.g. for testing), because by default only the starter files are submitted.

We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!

Grading

Here is the tentative point breakdown:

Readme (10 points)

Functionality (95 points)

Sanity cases (22 points) Correct results on the default sanity check tests.
Comprehensive/stress cases (30 points) Correct results for additional test cases with broad, comprehensive coverage and larger, more complex inputs.
Clean compile (2 points) Compiles cleanly with no warnings.
Clean runs under valgrind (15 points) Clean memory reports for all programs when run under valgrind. Memory errors (invalid read/write, use of freed memory, etc) are significant deductions. Memory leaks are a minor deduction. Every normal execution path is expected to run cleanly with no memory errors nor leaks reported. We will not test exceptional/error cases under Valgrind.
Reasonable efficiency (15 points) We expect your programs and code to be reasonably efficient in use of time and space. Full points are awarded for being on par (2-3x) with the sample program; deductions will apply for immoderate use of memory/time. There is no bonus for outperforming the benchmark (and efforts to do so could detract from your code quality...). Use valgrind and time to measure your programs' efficiency vs. the solutions'.
custom_tests (5 points) Your custom_tests file should include tests of your own that show thoughtful effort to develop comprehensive testing coverage. Please include comments that explain your choices. We will run your custom tests against your submission as well as review the cases to assess the strength of your testing efforts.

Code review (buckets together weighted to contribute ~20 points)

The grader's code review is scored into a bucket per assignment part to emphasize the qualitative features of the review over the quantitative. The style guide is a great overall resource for good program style. Here are some highlights for this assignment:

Use of pointers and memory. We expect you to show proficiency in handling pointers/memory, as demonstrated by appropriate use of stack versus heap allocation, no unnecessary levels of indirection, correct use of pointee types and typecasts, const correctness, and so on.
Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. When you need the same code in more than one place, you should unify, not copy and paste. If the C library provides functionality needed for a task, you should leverage these library functions rather than re-implement that functionality.
Style and readability. We expect your code to be clean and readable. We will look for descriptive names, defined constants (not magic numbers!), and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
Documentation. You should document both the code you wrote and what we provided. We expect program overview and per-function comments that explain the overall design along with sparing use of inline comments to draw attention to noteworthy details or shed light on a dense or obscure passage. The audience for the comments is your C-savvy peer.
Codecheck compliance. We will be running your code through codecheck and verifying that there are no issues present.

For more information about style guidelines to use for assignments, take a look at the CS107 Style Guide, linked to from the Assignments dropdown.

Post-Assignment Check-in

How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have written your own implementation of two workhorse Unix commands using raw memory, function pointers, and generics, and reflected more on partiality and proper disclosure. Way to go! You should now have great command of memory management and the client use of a void* interface and the generic implementation. The critical mastery is correctly passing/receiving the void*s flying around: what level of indirection to use, what typecasts are needed, using * and & in exactly and only the necessary places. You should know without question how to determine the necessary level of indirection and correctly apply it. It's also valuable to understand and be able to reason through the consequences of getting it wrong.

To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before exams.

To read the input, the mysort program reads each line into a maximum-sized piece of stack memory used temporarily and the copies to right-sized dynamic memory for persistent storage. This stack-heap dance is a common technique in situations like this. Explain why this approach is necessary/appropriate given the characteristics of the stack and heap.
The strcmp function almost matches the prototype for the comparison function argument of bsearch and qsort. A sketchy typecast (such as the one exhibited by scandir) could be used to hush the compiler's complaint about the mismatch, but this would be a huge mistake. strcmp is pretty much never the right function to use as a callback comparison function. Explain why not.
Read the man page for lfind and describe how to use lfind with an appropriate callback to re-implement the functionality of strlen.

Learning Goals

Overview

Testing and Debugging

Working with Memory and Pointers

1. myls

Review/Comment Starter Code

Implementation

scandir Information

Testing

2. Disclosure and Partiality Case Study

3. binsert

bsearch Information

binsert Implementation

Testing

4. mysort

Review/Comment Starter Code

Implementation

Testing

Submitting

Grading

Post-Assignment Check-in

Further Reading

1. `myls`

`scandir` Information

3. `binsert`

`bsearch` Information

`binsert` Implementation

4. `mysort`