Assignment 2 advice

Written by Julie Zelenski

Better late than never

You have read our advice on "Best Practices" for 107 assignments, haven't you?? :)

void* client: a cautionary tale

Our provided CVector is a full-featured, memory-managed, re-sizable array and CMap is a super-efficient hash table. Each is a clean package that abstracts away the low-level details, allowing you to put your energy toward solving more interesting problems. Data abstraction is a power tool, and one you have taken advantage of throughout the CS106 courses. What makes it a bit different in C is that these generic collections have to operate using a void* interface that requires the client stay on their toes at all times.

This is a very short program (our solution is only a total 1 page of code) and the handy CVector does a lot of the heavy lifting. However, don't mistakenly assume this will be a breeze. The code can be dense and it's frighteningly easy to make mistakes when handling memory using void* pointers. Go slow, develop in careful steps, do lots of testing along the way, and keep gdb and Valgrind in the loop.

Start by studying the interfaces in cvector.h (and cmap.h, if you want a head start on assign3). Using them as a client requires a solid understanding of how void* is used and steadfast vigilance about details. We strongly recommend doing this fill-in-the-blank warmup exercise before you even think about coding.

Memory efficiency

Code quality

Program facts

The data on student submissions below is from last quarter. The assignment has been re-designed this quarter, but retains similar elements. Your mileage may vary.

Frequently asked questions about assign2

A symbolic link (such as the slink we provide in your repos) is a directory entry that is a synonym for another path. It's like the filesystem version of a pointer. You can create a symbolic link using ln -s srcpath dstname. The directory /afs/ir/class/cs107/samples/assign2/loopy contains a variety of nested and circular links that might be useful for testing.

My searchdir traverses directories in a different order than the sample. Is this okay?

Yes. Your output should always list the same number of files as the sample. However, it might refer to some files by a different full path than the sample when multiple paths lead to that file. The full path chosen is dependent on which path was returned earlier from readdir and readdir makes no guarantee about the order it returns the directory entries. It has stable behavior per-host, so your program and the solution running on the same host should produce the same paths. Running on different hosts could mix things up, but rest assured the permutation in output will not be an issue in grading. A change in order of traversal will never cause a difference in number of files found, so those such mismatches are a sign of programming error.

My searchdir uses a lot of time/memory. Is that expected?

This little program does a surprisingly hefty amount of work, especially when searching a larger filesystem. The searchdir program makes calls to opendir call to traverse the directory entries. Behind the scenes, opendir makes a large allocation (~32K per call, freed by closedir) causing heap usage to balloon when traversing a deeply-recursive directory structure. A directory with 100 subdirectories is a 3MB hit to traverse. The directory /usr/include has over 1400 subdirectories so be prepared for chewing through ~45MB of memory! Don't be alarmed! As always, measure the sample using time and Valgrind to get a benchmark on what is reasonable. It would be unwise to attempt to search an enormous directory such as /afs/ir/class/ or /. We will not test on such cases and don't expect you to either :-)

Can you remind me how struct declarations work in C?

C struct declarations are almost, but not exactly, the same as C++. In C, the following declares a new struct type:

struct coord {
   int x, y;
};  // don't forget to end with semi-colon, without it you'll be mired in compiler errors

The name of the type is struct coord and you cannot drop the struct keyword; declaring a coord will not compile. You can wrap the struct definition in a typedef to set up a nickname:

typedef struct coord {
   int x, y;
} Coord;           // need to end with semi-colon here, too

This second version declares struct coord and gives it the nickname Coord. Either name (struct coord or Coord) can be used for the same type. The . (dot) operator is used to access the fields within a struct. If you have a pointer to a struct, your first attempt to access the fields is likely to run afoul of the fact that dot has higher precedence than * (dereference). You can add parentheses to force the desired precedence, or better, use the -> operator which combines . and * for this common need.

Coord origin;    // allocates struct on stack
Coord *p = malloc(sizeof(Coord));   // allocates struct in heap (note: sizeof works for correctly for structs)

origin.x = 0;    // access field from struct variable

                 // these next 3 lines access field via struct pointer
*p.x = 0;        // this line doesn't compile because default precedence applies . first then *
(*p).x = 0;      // this line works, use parens to override default precedence and apply * first, then .
p->x = 0;        // this line shows the preferred way to access field via struct pointer

origin = *p;     // btw, can copy a struct using ordinary assignment (copies all fields in one go)
*p = origin;

I don't understand the use of function pointer typedefs in the CVector and CMap header files. Please help!

The raw declaration of function types can be a bit goopy so CVector and CMap use typedef nicknames. For example, the cleanup function for CVector takes one void* parameter and returns void. The typedef below establishes the nickname CleanupElemFn as synonymous to the raw declaration of a void (*)(void *) function type.

typedef void (*CleanupElemFn)(void *addr);

This typedef allows parameters and variables to be declared using the typename CleanupElemFn. To write a cleanup function as a client, you merely need to define a function with the correct prototype. The function's name is used to access its function pointer . A function pointer is compatible with any function type matching its prototype.

void byebye(void *ptr)
{
    Binky *b = (Binky *)ptr;
    // do cleanup of b here7f

}

int main()
{
    CVector *all_my_binkys = cvec_create(sizeof(Binky), 10, byebye);

What is the expected prototype/behavior for a comparison callback?

The standard library qsort, bsearch, and lfind/lsearch functions and CVector sort and search all use the same format for the comparison callback. It is a two-argument function that take pointers to elements (const void *) and returns an integer which indicates whether the first element is less than, equal, or greater than the second. Here is a sample callback for comparing double elements:

int cmp_dbl(const void *p1, const void *p2)
{
    double d1 = *(double *)p1, d2 = *(double *)p2;
    if (d1 < d2) return -1;
    if (d1 > d2) return 1;
    return 0;
}

I'm getting compiler warnings about initialization discards ‘const’ qualifier from pointer target type. How do I resolve these?

The const qualifier on a declaration is an indication that type is only to be read, not written. A const char * and a char * are not quite the same type. You can read characters from either, but only the non-const version allows the characters to be changed. While it is perfectly fine to supply a non-const pointer where a const is expected (no cast required or recommended), the inverse will raise a warning from the compiler. Throwing down a typecast to const would quiet the compiler, but that's not the appropriate fix. If write-access is required, a read-only pointer is not going to work and a cast only serves to cover up the mistake. You should instead be fixing the logic to supply the correct pointer. If write-access is not needed, then the simple and correct choice is to add the const qualifier to the declaration.

What is the proper response if asked to find synonyms for an invalid target word (not alphabetic, longer than 50 chars, etc.)?

If the target word is not in the corpus (for whatever reason), print an empty list of synonyms and move on. No need to report or handle as an error.

Should I store structs or pointers to structs in my CVector/CMap?

If you want to put "widgets" in your CVector/CMap, the preference is to directly store the actual widget itself. Storing pointers when the value itself could have instead been stored is adding an unnecessary level of indirection, resulting in extra overhead and added complexity. For what reasons might it be required/appropriate to instead store a pointer to widget stored elsewhere?

  1. Widgets are not of a known fixed size, so actual widget cannot be directly stored as elements. Only by referring to widgets by address do they become homogeneous, so you store a pointer to a widget instead. (this is the case with storing strings, also types such as FILE/CVector/CMap whose size is unknown)
  2. Widgets are shared. There is one copy of the actual widget but multiple uses of it. Storing actual widgets creates duplicate copies and add redundancy to your data structure, so you store a pointer to a shared copy instead.

(following up on previous question) But I thought all elements for CVector/CMap must be pointer type! cvec_append takes a void*!

You will always refer to the element by its address (a void*) when adding to or retrieving it from the container, but that does not imply that the element type is itself must be pointer. The CVector and CMap are capable of storing elements of any type: int, struct, fixed-size array, pointer, and so on. When you pass/retrieve the element, you refer to it at a level of indirection away from its base type, thus an int element will be retrieved as a int *, a struct * element as a struct **, and so on.

How do I print/examine the contents of a CVector or a CMap from within gdb?

The internal structure of these types is opaque to the client (or at least will be until you complete assign3 :-), and not well-suited for you to manually dissect in the debugger. Instead, we recommend that you write your own debugging functions print_map and print_vec that iterate and print the elements. You can then call these functions from within the debugger as needed:

(gdb) p print_map(mymap)

When I compare two similarity scores of type float/double using ==, I get a compiler warning that says this is unsafe. How can I fix?

It will take until next week for us to delve into floating point types and learn how naive equality has issues and why the compiler squawks about it. For now, we're good with naive but hate compiler warnings, so will disguise our intentions from the compiler with this little trick: transform if (a == b) <equal> else <notequal> into the equivalent if ((a < b) || (a > b)) <notequal> else <equal>.

I am using my CVector/CMap to store elements of type struct. When I retrieve elements, only the first field(s) of my struct is correct, the rest is garbage. What gives?

This symptom could indicate that you have specified the wrong size to cvec_create/cmap_create. To create a container to store elements of structname, use sizeof(structname). If you use a size that is too small, the container will only store the first bytes of the struct, not the entire contents.

How can I see what a file's inode is so that I know what to type when testing my Inode search ("-i" part)?

To see a file or directory's inode number, run the unix command ls -i [name]. Or you can type just ls -i by itself to list everything in the current directory as name-inode pairs.