Checkpoint due: Thursday, March 11 at 11:59 PM
Rest of assignment due: Wednesday, March 17 at 11:59 PM
According to veteran C++ programmers, there are two projects which bring together all the knowledge that a proficient C++ programmer should have:
In this assignment, we will be putting #1 on your resume by building a STL-compliant HashMap.
Recall that the Map abstract data type stores key → value pairs, and provides efficient lookup of the value of any key.
There are two versions of this in the STL: std::map, traditionally implemented using a balanced binary search tree, and std::unordered_map (introduced in C++11), which uses a clever technique called hashing and supports the most map operations in constant time (!).
You will not necessarily write a lot of code on this assignment (~50 lines), but some of your time will be spent on understanding the starter code given to you, and reasoning about design choices and common C++ idioms. You may optionally work with a partner. We recommend having a partner to discuss the design questions. If you choose to work with a partner, we ask that you and your partner submit a single version of the assignment among both of you. This means that you will ideally be working with your partner on a shared version of the code and short answer questions. Feel free to start a shared Google Doc, a private GitHub repository, or anything else that you'd need to keep track of a shared version of the code. You may also choose to schedule video chat sessions with each other to work on the assignment.
The starter code passes 22 of the 36 required test cases. Your job is to get the remaining 14 test cases (6 in milestone 3, and 8 in milestone 4) to pass, and to reasonably answer the 10 required short answer questions throughout the assignment.
Milestones 2 and 5, as well as anything listed in the extensions, are completely optional. Milestones 2 and 5 form the remaining 7 optional test cases. You also do not need to answer the 1 short answer question in milestone 5.
Only Milestones 1, 3, and 4 are required, but we are happy to support you in completing 2 and 5. You don't have to do them in order, but finishing #1 will help you significantly with #3 and #4. Please try your best to complete Milestones 1 and 3 in advance of the checkpoint deadline.
Template classes, const-correctness, iterators, RAII.
Iterator-based constructors. (We provide you the .h file headers.)
Operator overloading. (We provide you all the headers.)
Implement special member functions (headers not included.)
Implement benchmark tests.
Please download the starter code here. The starter code contains the following files:
hashmap.h:
This file contains the class function prototypes for the minimalistic HashMap, and also contains thorough comments about each function.
The prototypes for milestone 3 are provided here, but you will need to add your function prototypes for any function that you write for milestone 4.
Like in CS 106B, you may add any private helper functions, but do not add or change the prototypes of any public functions.
Imagine what a disaster it would be for the world if the vector’s size function was renamed to length or if the erase function had a few extra parameters...a good chunk of the world’s C++ code would stop compiling!
hashmap.cpp:
This file contains the implementation of the HashMap, and also contains thorough comments about implementation details.
The function skeletons for milestone 3 are provided here which mirror the prototype in the .h file.
You will need to put the implementations for the functions for milestones 3 and 4 in this file.
hashmap_iterator.h
This file contains the complete class declaration and implementation for a HashMapIterator, and also contains thorough comments about each function.
You will be reading the code here for milestone 1, but you will not need to modify this file at all.
student_main.cpp:
When the test harness is turned off (see below), the student_main() function in this file will be run.
You can treat this as if you were writing your own main function to test your program.
You are encouraged to write some code here to test and play around with the HashMap implementation, but it is not necessary to modify this file.
test_settings.cpp: This file contains macros which lets you on the test harness. If you turn on the test harness (#define RUN_TEST_HARNESS 1), test cases will be run. If it is turned off (#define RUN_TEST_HARNESS 0), the student_main function will be run. This is analogous to using NO_TESTS vs. ALL_TESTS in CS 106B's SimpleTest.tests.cpp: This file contains the code for each test case. You are welcome to look at this, add some map.debug() statements, or even add your own test cases. We will test your code on an unaltered version of this file, so be careful about changing this file. short_answer.txt: You will write your solutions to the short answer questions in this file.HashMap-V4.pro: This file contains the Qt settings for the project.
You will notice a flag -O0 in this file, which controls the amount of compiler optimization.
In milestone 5 you will change this flag to -O2 when running benchmark tests. You do not need to modify this file, unless you decide to work on optional milestone 5.
Note: some authors insert a node to the end of a bucket's linked list. We will always insert into the front, since order doesn't matter and inserting to the front is faster and easier.
Read through the starter code's interface (.h) and template implementation (.cpp) files to understand what has already been implemented. This handout you are reading is fairly sparse, and the reason is that there is a lot of documentation provided in the starter code. You should be able to identify the standard map interface (constructor, insert(), at(), contains(), etc.) as well as the hash table implementation (_bucket_arrays, node*'s, etc.).
Go to the student_main function, and try solving your favorite CS 106B HashMap problem there. You can call map.debug() inside the test cases to see a visual representation of the map printed to the console. You can call this function in your student_main function without the test harness, or inside the test cases.
Then, navigate to short_answer.txt and answer these short-answer questions:
rehash() function, the for-each loop contains an auto&. What is the deduced type of that auto, and why the ampersand necessary?HashMap<std::string, int> map;
map.insert({"Avery", 3});
auto iter = map.begin();
iter->first = "Anna";
auto anna_iter = map.find("Anna");
In a previous version of this assignment, we asked students to implement an iterator class. By implementing an iterator class, you have a simple way of traversing the elements in the HashMap. Combined with iterator-supported implementations of find, insert, and erase, you never have to manipulate the linked lists in the later milestones. However, implementing an iterator class is fairly time-consuming, so we decided to provide you our implementation of the iterator class and the HashMap member functions that interface with the iterator class (e.g. begin(), end(), find(), etc.). Instead, you should read and understand how the iterator class is implemented, and answer short answer questions on it.
This milestone is optional. This milestone naturally fits here after you have explored the iterator class, and are ready to practice using the iterator class while also learning some important new C++11 class design syntax.
The starter code already provides multiple constructors. We will add two more iterator-based constructors:
The headers for each constructor is provided to you in the .h file. You will need to write the function itself in the .cpp file. The implementation is not difficult (~10 lines total), but we're also introducing a few new language features and syntax so you may need to read the C++ documentation.
Here are some hints:
You will implement four operators. The headers are provided in the .h file, and the function skeletons are provided in the .cpp file. Make sure you undestand why these skeletons and headers are consistent with the principles taught in the operator overloading and const-correctness lecture.
[])The index operator accepts a key, and returns a reference to its mapped value. The index operator should support auto-insertion—if a key is not found, then a new key/mapped pair is inserted with the given key and a default value of the mapped type.
Hints:
M val;).<<)The stream insertion operator writes the contents of a HashMap into an output stream. The format resembles the format used by the Stanford HashMap, as seen below:
HashMap<string, int> map;
cout << map << endl; // prints "{}"
map.insert({"Anna", 2});
cout << map << endl; // prints "{Anna:2}"
map.insert({"Avery", 3});
cout << map << endl; // prints either "{Avery:3, Anna:2}"
// or "{Anna:2, Avery:3}"
The stream insertion operator will need to support chaining.
Hints:
==) and Inequality (!=)Hint: You should only have to loop through one of the HashMaps once.
Test 2A, 2C, and 2E test the functionality of the indexing, stream insertion, and equality/inequality operators, respectively. Tests 2B, 2D, and 2F test the const-correctness of the respective operators. Since we've given you the correct headers, unless you changed the header, tests 2B, 2D, or 2F should pass if 2A, 2C, or 2E pass, respectively.
Fun fact about the test harness: the const-correctness tests check whether proper use of the HashMap class as a client compiles. It also uses some advanced C++ metaprogramming techniques to check that improper use of the HashMap class as a client does not compile. In past quarters, students had to come up with the headers of these operators themselves, so this would check whether they have const-correctness issues.operator[] has some extra functionality compared to the at function.Any good STL-compliant class must have correct special member functions.. Recall that there are six major special member functions:
The first two are implemented for you; your job is to implement the last four.
Specifically, the copy operations should create an identical copy of the provided HashMap, while the move operations should move the contents of the provided HashMap to this (the instance of the HashMap upon which the move operation is called). Avoid memory leaks and perform your copy/move as efficiently and safely as possible.
We are requiring you to use the member initializer list (colon after constructor) whenever possible for efficiency.
We do not provide the headers, so you will have to add them to the .h file yourself.
Hints
Test 4A, 4B, and 4C create copies of your HashMap using the copy constructor and copy assignment operator, and verifies their correctness. In addition, there are tests for edge cases such as self-assignment. Tests 4D, 4E, and 4F try to move HashMaps from one to another using the move constructor and move assignment, and verifies correctness just like test 4A.
Note that the move operations and time tests may pass even if you haven't implemented the move operations (recall that if no copy/move operations are declared, the compiler generates default ones, which will pass the move but not copy tests). See the test_settings.cpp for more information. The tests use some compiler directives (#pragma) to silence compiler warnings about self-assignment. You can safely ignore those.
Test 4G and Test 4H time your move operations to verify that you are actually moving the contents of the HashMaps rather than copying the contents. The tests try the move operations on HashMaps of different sizes, and verify that the runtime of the move operations is O(1), not O(n). You can see the results of the time test printed in the test harness.
This section is optional but shouldn't take that long and should be interesting, as well as help you understand efficiency.
We've written some performance tests for you. Follow the instructions below.
std::unordered_map! That's certainly unexpected. There are many more HashMap implementations which can perform faster than our HashMap. Check out this link.
For the checkpoint, you should submit Milestones 1 and 3 in two files: hashmap.cpp and short_answer.txt. If you worked with a partner, please have one partner submit the assignment. There will be a place on Paperless for you to enter in the other partner's SUNet ID so that you both get credit.
When you're finished, submit three files: hashmap.cpp, hashmap.h, and short_answer.txt on Paperless! Follow the same instructions for partner work as above.
And finally... Good luck! :)
All of these are optional. Some may require supplemental material or additional reading.
The insert member function creates many unnecessary copies. At the function call, it first creates a temporary copy of the key and the mapped value, then it creates a temporary pair containing that key and the mapped value. Inside the insert function, it creates a copy of this pair (since it's passed by const l-value reference), and then it assigns the value_type in a node to be a copy of this pair.
You've solved a similar problem when learning about move semantics. There is a more advanced move semantics problem called perfect forwarding, which uses a combination of variadic templates, r-values, and a new function std::forward. Read about emplacement here, and then implement the member function try_emplace that the std::unordered_map has.
If you were disappointed that you did not get to implement your own iterator class, you should try creating a more advanced iterator class! Instead of iterating through the elements in the order they are stored in the HashMap (i.e. unordered), you should iterate in the order of insertion. This is a tricky linked list problem, as there are two linked lists - one for the buckets, and one to keep track of order insertion. Java has a LinkedHashMap data structure that C++ does not have. Implement one!
You should have realized in the benchmark tests that one main drawback of our current HashMap is that it does not automatically resize. Implement that feature in your HashMap! Experiment with what kind of rehash policy works best on a variety of workloads.
Implement any features our HashMap is missing. Check out the std::unordered_map page for more ideas!
Implement the same interface, but backed by a binary search tree instead! To make iterator traversal easier, add a "parent" pointer to each node in addition to the "left" and "right" children.
In the past we've had students implement Kd-trees, Gap Buffers, and Merkle Trees. Find one you really like, implement it, and add some iterator support!