Assignment 2: Void* client fun

Due: Wed Feb 1 11:59 pm
Ontime bonus 5%. Grace period for late submissions until Fri Feb 3 11:59 pm

Assignment by Julie Zelenski

Learning goals

Completing this assignment should sharpen your skills in:

  1. client use of generic void* interfaces in C
  2. writing void* client callback functions
  3. debugging memory errors (because we are sure you will encounter at least one or two :-)

The problem

You are to implement a small guided exercise that features client use of void* generic interfaces. We provide the efficient and full-featured CVector and CMap, the C versions of what you previously have known as ArrayList/Vector and HashMap (in Java/C++). As C has no sophisticated language support for type-safe generics, these containers have no choice but to work via untyped pointers and manipulation of raw memory. The role as client is considerably more challenging given the lack of clear pointer types. By the time you're done, you'll appreciate that C is a versatile language that puts few restrictions in your way, but you'll also wish that it didn't always play it so fast and loose -- there is danger here!

searchdir usage

The searchdir program recursively searches a directory to find and print various information, depending on flag used. The starter code provides a draft version of program that you are to modify/extend to full functionality. Here is what the program must do in the end:

  1. Usage is: ./searchdir [one of: -i or -d or searchstr] [optional: directory name]
  2. The directory name is optional, and, if used, is always the 2nd argument. If a directory name is not given, the default of "." (current directory) should be used.
  3. If an inadequate number of arguments is given, or the specified directory does not exist, a helpful error message is printed.
  4. If the first argument is neither "-i" nor "-d", then the first argument is assumed to be a searchstr, that is, a string to search file and directory names for. The search string is a simple fixed string (no metacharacters, no quotes around it, nor whitespace in it). If a search string is given, the program should recursively search the directory to find and print the names (including path) of all directories and files whose names match the search string.
  5. If the first argument is "-i", the program should recursively search the directory to find and save the inode values (more on this below, it is a unique file/directory ID number) of all directories and files. It should then repeatedly ask the user to enter an inode, and then print for the user the name (including path) of the directory or file with that inode, until the user chooses to quit. If the user types a non-existent inode, it prints nothing.
  6. If the first argument is "-d", the program should recursively search the directory to find and save the modified dates of all directories and files. It should then repeatedly ask the user to enter a date, and then print for the user the names (including paths) of all directories and files with that modified date, until the user chooses to quit. If there are no matches for a given date, nothing is printed.

Below is more detail on each of the three modes.

Name search (searchstr)

We recommend you begin with this feature. The other features will follow a similar template, but have much less starter code to guide you.

  1. The starter code uses a few library functions you may not have seen. Review the man pages for opendir/readdir/closedir, stat, and sprintf and skim our code to see how we are using them to access the filesystem and manipulate file paths.
  2. Study how the starter code stores the filenames into the CVector. Each path is assembled into a stack-allocated buffer. That buffer is sized to length PATH_MAX (a whopping 4K) of which only a fraction will be used. The CVector of matches is created to hold elements of size PATH_MAX and when adding an entry, the full-sized buffer is copied and stored. At 4K per element, a CVector of 300 paths accumulates a memory footprint of over 1MB-- ouch! Change the element type of the CVector to instead store each path as a pointer to a right-sized dynamically-allocated string instead of that wasteful huge buffer. This will necessitate modifying how the CVector is created and disposed and how its elements are added and accessed. Storing 300 right-sized paths into the CVector now requires several orders of magnitude less memory than previous. Win!
  3. The starter code as-is prints all files, but it should only include those with matching names. Add a test to check if a file's name (not the full path, just the filename itself) contains the search string before adding it to the CVector of matches.
  4. The starter code has no protection against deep recursion when a directory being searched contains a link back to itself/parent (this can happen with symbolic links, like the slink symbolically-linked directory you had in your assign1 and again in assign2). Fix this by tracking which directories have already been visited and skip over any previously explored directory. Add code to manage an additional CVector you should name visited, and use it to store the inode numbers of each directory that has been visited. The rationale for tracking by inode instead of name is that the inode is unique and will detect a visited directory even if linked under a different name. This task gives you a chance to experiment with storing a different kind of element type in the CVector and use of cvec_search and writing a comparison callback. (Although it is possible to use CMap for this, we ask that you use CVector.)
  5. Before printing the CVector of matching files, sort them by path length, breaking ties lexicographically for paths of the same length. cvec_sort is just what you need and a little more practice writing callbacks for you.
  6. Once all the functionality is perfect, fire up Valgrind again to look for leaks. The coordination of memory management between client and CVector has some extra challenges, specifically in when and how to use a cleanup callback.
  7. If there is any part of using CVector/CMap that does not fully make sense to you, resolve it before moving on. Bring questions to office hours, post on the forum, send us an email.

When finished, the behavior of your searchdir with the default searchstr option should be as follows:

Here are a few sample runs:

myth> searchdir .h 
./cmap.h
./cvector.h

myth> searchdir ea /afs/ir/class/fakeclass/
/afs/ir/class/fakeclass/sea_frags.txt
/afs/ir/class/fakeclass/dream_frags.txt
/afs/ir/class/fakeclass/preamble_frags.txt
/afs/ir/class/fakeclass/reassemble_soln.txt

Inode search (-i)

After gaining guided practice with two different uses of the CVector, it's time to get to know CMap by implementing the Inode search. Recall from implementing the Name search (with searchstr) that C has a function stat() that will give you the inode of a file or directory, given its name. With this feature, you'll provide the reverse lookup: given an inode, report the file's name. Repeated key-value pair lookup is a good use case for CMap! (Though, admittedly, you likely wouldn't notice a difference between O(logN) and O(N).) At the highest level, CMap is a hash map whose keys are always strings, and whose values are a generic type, but you should read the cmap.h file in detail. Here's how you will use CMap to implement Inode search:

  1. Using the same type of recursive directory search you used in gather_files for the name search, recurse the directory tree. For each file and directory, save its inode in a CMap that uses the inode as a key (keys must be strings, so a string version) and the file or directory name (with path) as the value. All strings in the CMap must be right-sized (see "tips" below).
  2. Once you've exhausted the recursive directory search, enter a loop where you repeatedly ask the user to enter an inode number, retrieve the associated file or directory name from the CMap, and print it. When the user types "q" instead of an inode, that is the sentinal value indicating you should exit the program (but not before cleaning up memory, of course!).
  3. If you encountered a file or directory more than once during your recursive directory search, giving non-unique path value for a given inode key, any valid path may be printed. However, as a reminder, you should be keeping track of visited so as not to get stuck in a loop, and so printing a name that traces a loop in its path would not be acceptable. (you may use the CVector visited again, or the same CMap that you build up for the queries--though recall you must use CVector for visited for the Name search option).
  4. If the user types anything other than q or a valid inode, print nothing and continue asking for another inode search.
  5. NOTE: Unlike with the Name search, the -i search should collect and include the CMap both directory and file inode numbers.

Tips:

Here are a few examples:

myth> ./searchdir -i /afs/ir/class/cs107/samples/unixhunt
Enter inode (or q to quit): 1800012521
/afs/ir/class/cs107/samples/unixhunt/nums
Enter inode (or q to quit): 1800012458
/afs/ir/class/cs107/samples/unixhunt/clue1
Enter inode (or q to quit): 1800012898
/afs/ir/class/cs107/samples/unixhunt/air/rockhopper/gatekeeper.py
Enter inode (or q to quit): blah
Enter inode (or q to quit): 1234
Enter inode (or q to quit): q
myth>


myth> ./searchdir -i 
Enter inode (or q to quit): 1800025021
./Makefile
Enter inode (or q to quit): 1800020940
./searchdir.c
Enter inode (or q to quit): q
myth>

Date search (-d)

Now that you love CVector and CMap, we're going to put a CVector in a CMap. This feature allows users to look up files by modified date. Here's how you will use CMap and CVector together to implement Date search:

  1. Using the same type of recursive directory search you used twice before, recurse the directory tree. For each file and directory, save its modified date as a string "MM/DD" in a CMap that uses the date string as a key and a CVector of file and directory names (with path) that share that date as the value. All strings in the CVectors in the CMap must be right-sized.
  2. Once you've exhausted the recursive directory search, enter a loop where you repeatedly ask the user to enter a date as MM/DD, retrieve the associated CVector of files and directories names from the CMap, and then enumerate over the vector printing each entry. When the user types "q" instead of a date, that is the sentinal value indicating you should exit the program (but not before cleaning up memory, of course!).
  3. If you encountered a file or directory more than once during your recursive directory search, it is ok (but not required) to have duplicate names for it in the results that are printed. However, as a reminder, you should be keeping track of visited so as not to get stuck in a loop, and so printing a name that traces a loop in its path would not be acceptable.
  4. If the user types anything other than q or a MM/DD date that exists in your map, print nothing and continue asking for another date search.
  5. NOTE: Unlike with the Name search (but like the -i Inode search), the -d search should collect and include the CMap both directory and file modify dates.

Tips:

Here are a few examples:

myth> ./searchdir -d 
Enter date MM/DD (or q to quit): 10/11
./lect3/Makefile
./lect3/ptr.c
./demo.mp4
Enter date MM/DD (or q to quit): 10/12
./lect5/sizes.c
Enter date MM/DD (or q to quit): blah
Enter date MM/DD (or q to quit): 10/13
Enter date MM/DD (or q to quit): 55/99
Enter date MM/DD (or q to quit): q
myth>


myth> ./searchdir -d  /afs/ir/user/c/b/cbl/foo
Enter date MM/DD (or q to quit): 03/05
/afs/ir/user/c/b/cbl/foo/bar.txt
/afs/ir/user/c/b/cbl/foo/doc.txt
Enter date MM/DD (or q to quit): q
myth>

Guidelines

Grading

Have you read how assignments are graded? For this assignment, the anticipated point breakdown will be in the neighborhood of:

Functionality (90 points)

Code review (buckets together weighted the usual style percent of score) Here we will read and evaluate your code in areas such as:

Submissions received by the assignment deadline will receive an on-time bonus equal to 5% of the functionality points earned by the submission.

Getting started

Finishing

When your code is finished, don't forget to finish with the all-important submit step to send your code to us for grading. If needed, familiarize yourself with our late policy.

How did this assignment go for you? Review the post-task self-check.



Contents