Assignment 3: CVector and CMap
Due date: Tue Apr 26 11:59 pm - Hard deadline: Thu Apr 28 11:59 pm
Assignment by Julie Zelenski
[Quick Links: Implementation details, Advice page, Grading]
Learning goals
Completing this assignment will provide you with:
- a clear understanding of the organization and management of raw memory
- extensive practice using low-level
void*
memory operations
- experience implementing a generic
void*
interface
Your assignment
Last time you were the client of CVector and CMap, this time you will implement CVector and CMap. It's simple enough to understand the goal, but handling the nitty-gritty internals of managing raw memory will stretch your programming skills to the limit. Start early. Draw pictures. Ask questions. Exercise on trivial test cases first. Use gdb and Valgrind early and often. Test more thoroughly than you have ever before in your life.
CVector implementation
- The CVector manages an indexed collection of elements, providing an array made fancy by the inclusion of memory management and convenient operations. It operates through a generic
void*
interface.
- The
cvector.h
header file details the required behavior as seen by the client. Your implementation must adhere to this interface. Change nothing in the header file.
- The CVector is a struct with various fields, one of which is a pointer to the contiguous block of memory for the elements. The element storage is laid out like an ordinary C array. It is not an array of pointers to elements stored elsewhere, the elements themselves are stored contiguously, without the overhead of an extra level of indirection per-element. The internal structure of a CVector storing integer elements has this layout:
- A CVector is created empty and contains no elements. The capacity_hint given to
cvec_create
dictates the starting capacity and initial storage allocation. As elements are appended or inserted, the capacity is consumed. Whenever the current capacity is filled, the CVector doubles the capacity to allow for future growth.
- The CVector elements are of client-chosen type. The CVector does not have knowledge of the internal structure of the client's chosen element type, it handles the elements as raw bytes. Elements are referred to by address. This applies when passing an element to the CVector, retrieving an element from the CVector, invoking a
function on CVector elements, and so on; all of these operations pass/return a
void*
which is the address of an element.
- The CVector operations are layered upon functions from the standard C library. Dynamic allocation is managed with
malloc/realloc/free
. Raw bytes are copied using memcpy/memmove
. Sorting and searching use lfind/bsearch/qsort
. Fatal error conditions are handled via assert
.
CMap implementation
- The CMap manages a collection of key-value pairs, providing a dictionary-like structure with efficient store and lookup operations. It operates through a generic
void*
interface.
- The
cmap.h
header file details the required behavior as seen by the client. Your implementation must adhere to this interface. Change nothing in the header file.
- The internal representation of a CMap is a hashtable. The starter code provides a string hash function for your use. Hash collisions are resolved by chaining, i.e., all keys that hash to the same code are grouped in one bucket.
- The CMap struct contains various fields, one of which is a pointer to a dynamically-allocated array of buckets. The number of buckets will be equal to the capacity_hint given to
cmap_create
. It is the client's responsibility to choose an appropriate value. To obtain the best runtime and memory efficiency, the capacity_hint should approximate the total number of entries. The number of buckets does not change over the lifetime of the CMap; adding more entries than capacity_hint just overloads the buckets. When overloaded, the CMap will still function correctly, but performance degrades.
- All entries that hash to the same code are stored in the same bucket. Each entry is its own cell and the cells are joined together into a singly-linked list.
-
A cell is a blob of memory which contains the link pointer, the characters for the key string, and the client's value laid out contiguously. For example, consider a CMap associating the average GPA (value type double) with a group's name. If both keys "lsjumb" and "swe" have hashcode 1, the CMap layout could be:
The value and the key are not pointers to data stored elsewhere, the key characters and value data are stored directly in the cell, without the overhead of an extra level of indirection for either key or value. The memory used for a cell is exactly the sum of sizeof(pointer) + storage for characters of key + size in bytes of value. You cannot define a struct that describes this cell (as the size of the value per-CMap and length of string per-key vary!) so you will have to manually construct the cell at runtime using your knowledge of how the memory is laid out. This part is tricky! Think through how you will construct and deconstruct a list cell before writing any code.
-
There are important differences in how keys and values are handled by the CMap. The keys are C-strings (char*
). The CMap copies the chars from the client's key into its internal storage. The CMap owns that copy of the key and cleans up that copy when an entry is removed/disposed. The values are of client-chosen type. Values are passed/returned by address (void*
). The CMap does not have knowledge of the internal structure of the client's chosen value type, it merely copies the specified number of bytes from the client's address into the CMap's internal storage. The CMap calls the client's cleanup function on a value when it is removed, replaced, or disposed.
Requirements that apply to both
- Testing. The starter project contains the test programs
vectest
, maptest
, and thesaurus
. These programs are helpful, but are by no means comprehensive. Part of your job is figuring out where additional tests are needed. You can change/extend/cannibalize the given test programs and add new test programs of your own.
- Memory. We expect CVector and CMap to run cleanly under Valgrind with no errors in any situation and no leaks if exiting cleanly. (We do not require cleanup of memory leaks from fatal errors).
- Efficiency. The CVector should have array-based big-Oh performance (e.g. insert/remove is O(n), append/replace is O(1)). The CMap should have hashtable-based big-Oh performance (e.g. O(1) for put/get/remove if created with appropriate capacity). The memory efficiency will be evaluated strictly. Per-element storage should be exactly as diagrammed above; there should be no additional levels of indirection, extraneous padding, nor unnecessary allocations during operations.
- Code unification/reuse. Make a vow to never rewrite or copy/paste code. Instead decompose and unify. For example, CVector insert and append have a lot in common, and thus should share code, not duplicate it. Even a small bit of code, such as the tricky expression used to calculate the nth address for CVector, is worth factoring out to a helper. The use of the helper improves readability everywhere and you no longer have to stare down that intricate pointer arithmetic in multiple places to verify it is correct each time.
- Design/style/readability. When writing this kind of gnarly memory-handling code, it is incredibly important that you keep the code as clean and readable as possible. We'll be watching!
- Don't miss out on the companion advice page:
Go to advice/FAQ page
Grading
Read our page on how assignments are graded for background information on our grading process and policies. A sketch of how the points are going to be allocated:
Functionality (125 points)
- CVector/CMap layout. The data layout for storing CVector and CMap elements is a strict requirement. Submissions in violation of the spec will warrant a very significant point deduction. Compare your memory model to the given diagrams to ensure you are in compliance!
- Sanity cases. (50 points) The standard sanity check tests are used for sanity grading.
- CVector single operations. (20 points) Test behavior of each CVector operation in isolation.
- CVector combined operations. (5 points) Test sequence of all CVector operations intermingled.
- CMap single operations. (12 points) Test behavior of each CMap operation in isolation.
- CMap combined operations. (5 points) Test sequence of all CMap operations intermingled.
- CVector/CMap assertions. (6 points) Verify asserts are raised on required errors specified in the CVector and CMap interfaces.
- CVector and CMap stress cases. (5 points) Test a mixed sequence of all operations from both CVector and CMap on large data sets.
- Clean compile. (2 points) We expect your code to compile cleanly without warnings.
- Clean run under valgrind. (10 points) We expect a clean run under valgrind during all normal operations. Memory errors are significant issues, leaks are more minor.
- Reasonable efficiency. (12 points) We expect your operations to meet the required big-Oh runtime documented into the interface and memory usage should be structured to exactly match the diagrams given above (e.g. CMap cells consist of link pointer + key chars + value). Both runtime and memory efficiency should be on par with the provided sample implementation.
Code quality (buckets together weighted to contribute ~30 points)
Here we will read and evaluate your code in areas such as:
- Style and readability. We expect your code to be clean and readable. We will look for descriptive names, helpful comments, and consistent layout. Be sure to use the most clear and direct C syntax and constructs available to you.
- Clean use of pointers and memory. We expect you to show proficiency in handling pointers/memory, as demonstrated by appropriate use of stack versus heap allocation, no unnecessary levels of indirection, pointee types and typecasts used correctly, and so on. Show you are wise in the ways of pointers!
- Program design. We expect your code to show thoughtful design and appropriate decomposition. Data should be logically structured and accessed. Control flow should be clear and direct. Common code between the operations should be factored out and unified, not duplicated. Add private shared helpers to enable code-sharing where appropriate. When the C library provides needed functionality, you should leverage these library functions rather than re-implement that functionality.
Submissions received by the assignment deadline will receive an on-time bonus equal to 5% of the functionality points earned by the submission.
Getting started
The assign3 project contains the source files, our test programs, and a Makefile. Check out the starter project from your cs107 repo using the command
hg clone /afs/ir/class/cs107/repos/assign3/$USER assign3
Finishing
When finished, use submit to turn in your code for grading. The sole output conformance requirement is to be sure there is no output of any kind from your operations. Double-check that you have removed/disabled all print statements-- any extraneous output can interfere with the autotester and cause loss of points.
How did this assignment go for you? Review the post-task self-check.