Assignment 2: HashMap

Checkpoint due: Thursday, November 12 at 11:59 PM

Rest of assignment due: Wednesday, November 18 at 11:59 PM


According to veteran C++ programmers, there are two projects which bring together all the knowledge that a proficient C++ programmer should have:

  1. Implementing a STL-compliant template class; and
  2. Implement a macro to hash string literals at compile-time.


Saturday is game night. (Credit: https://xkcd.com/426/.)

In this assignment, we will be putting #1 on your resume by building a STL-compliant HashMap.

Recall that the Map abstract data type stores key → value pairs, and provides efficient lookup of the value of any key. There are two versions of this in the STL: std::map, traditionally implemented using a balanced binary search tree, and std::unordered_map (introduced in C++11), which uses a clever technique called hashing and supports the most map operations in constant time (!).

In this assignment, you will be given a minimal implementation of a HashMap (the name of this data structure in the Stanford C++ library) that you could have written in CS 106B. The starter code is functional, but is not "STL-compliant"; for example, copy and move semantics are not supported yet. Your goal is to extend it so it becomes an STL-compliant, industrial strength, robust, and blazingly fast data structure.

You will not necessarily write a lot of code on this assignment (~90 lines), but some of your time will be spent on understanding the starter code given to you, and reasoning about design choices and common C++ idioms. You may optionally work with a partner. We recommend having a partner to discuss the design questions. If you choose to work with a partner, we ask that you and your partner submit a single version of the assignment among both of you. This means that you will ideally be working with your partner on a shared version of the code and short answer questions. Feel free to start a shared Google Doc, a private GitHub repository, or anything else that you'd need to keep track of a shared version of the code. You may also choose to schedule video chat sessions with each other to work on the assignment.

Assignment Timeline

The starter code passes 7 of the 14 required test cases. To pass the assignment, your job is to get the remaining 7 test cases (4 in milestone 2, and 3 in milestone 3) to pass, and to reasonably answer all 10 short answer questions.

To help you stay on track for this assignment, we will require you to submit a checkpoint approximately one week before the deadline. For the checkpoint, please have Milestones 0 and 2 finished.

In the original version of the assignment, there were 10 milestones, and completing all milestones would help you accomplish #1. In the interest of time, and constrained by the fact that we only have 10 weeks, you only need to complete milestones 2, 3, and 4. We encourage you to go beyond the minimum requirements to create a truly STL-compliant template class, either now or in the future.

Only Milestones 0, 2, 3, and 4 are required, but we are happy to support you in the other milestones:

0

Read about hashing. Examine starter code/test harness.

1* 10-15

Complete the implementation of rehash.

2 40-50

Overload 4 operators and make Hashmap const-correct.

Checkpoint due Thursday, November 12 at 11:59 PM

3 30-40

Implement 4 special member functions.

4

Answer 10 short answer questions about your code.

Final assignment due Wednesday, November 18 at 11:59 PM

5 15

Implement initializer_list and range constructors.

6 120

Implement an iterator class.

Beyond??

See extensions.

Tips for the Assignment

  • Milestone 0 requires knowledge of linked lists, which CS 106B covers in week 7. You should have plenty of time to complete the assignment due in week 9.
  • The starter code comments are thorough, more so than this handout. Read them!
  • The handout may seem vague as in describing what you are supposed to do. We don't provide even the function prototypes. This is intentional. Designing the class definition itself is a valuable learning activity, and we don't want to constrain you to one particular design. You need to consider how a client might use your HashMap, and the test harness will try every reasonable way of using your class. Ask us if you are ever unsure about how to proceed.

Starter Code

Please download the starter code here. The starter code contains the following files:

  • hashmap.h: This file contains the function prototypes for the minimal HashMap, and also contains very thorough comments about each function. The prototype for rehash() is provided here, but you will need to add your function prototypes for any function that you write for milestone 2 and 3.
  • hashmap.cpp: This file contains the implementation of the HashMap, and also contains very thorough comments about implementation details. The skeleton for rehash() is provided here, and you will need to declare your own functions for milestones 2 and 3..
  • student_main.cpp: When the test harness is turned off (see below), the student_main() function in this file will be run. You can treat this as if you were writing your own main function to test your program.
  • test_settings.cpp: This file contains macros which lets you on the test harness. If you turn on the test harness (#define RUN_TEST_HARNESS 1), test cases will be run. If it is turned off (#define RUN_TEST_HARNESS 0), the student_main function will be run. This is analogous to using NO_TESTS vs. ALL_TESTS in CS 106B's SimpleTest.
    The file also lets you turn on or off individual test cases. Test cases that are turned on will result in a PASS or FAIL when run. Test cases turned off always result in a SKIP (which is effectively a FAIL). The reason you will turn off test cases is that the test cases for milestones 2 and 3 may call functions that you have not declared or implemented, so turning them on will generate compiler errors. Only turn on a test case after you have implemented that function, as explained in the test_settings.cpp file. You may get compiler errors as well if your function prototype is incorrect (likely because of const correctness, or parameter/return value type issues).
  • tests.cpp: This file contains the code for each test case. You are welcome to look at this, add some map.debug() statements, or even add your own test cases. We will test your code on an unaltered version of this file, so be careful about changing this file.

Milestone 0: Background Reading and Examine Starter Code

You will implement the equivalent of the Stanford HashMap, using a hash table that resolves collisions using external separate chaining. Luckily for us, CS 106B covers hashing and external hash tables with chaining! Before reading further, familiarize yourself with hashing by reading the resources below (if you're in CS 106B or recently took it, feel free to skip the CS 106B resources).

The Map Interface

Hashing

Note: some authors insert a node to the end of a bucket's linked list. We will always insert into the front, since order doesn't matter and inserting to the front is faster and easier.

Reading the Starter Code Documentation

We could have asked you to implement a HashMap yourself given you knowledge from CS 106B. However, since practicing linked list algorithms is not the main goal of CS106L, we've given you the starter code.

Read through the starter code's interface (.h) and template implementation (.cpp) files to understand what has already been implemented. This handout you are reading is fairly sparse, and the reason is that there is a lot of documentation provided in the starter code. You should be able to identify the standard map interface (constructor, insert(), at(), contains(), etc.) as well as the hash table implementation (_bucket_arrays, node*'s, etc.).

You can call map.debug() inside the test cases to see a visual representation of the map printed to the console. You can call this function in your student_main function without the test harness, or inside the test cases. This will be useful when you begin milestone 1.

There's nothing you need to turn in after completing this milestone, but please do make sure that you're comfortable with the above content before moving on.

Optional Milestone 1: Implement rehash()

This milestone is optional. It's a challenging linked list problem that will be great practice for CS 106B assignments and exams.

Please don't make this your final personal project in CS 106B. :D

Implement the public member function rehash(), whose expected behavior is documented in the .h file. Your implementation of rehash() must not allocate or leak memory; instead, it should reuse existing nodes.

The purpose of this milestone is to get you familiarized with the starter code and with working in a template class. It should also serve as a good review of the linked lists algorithms you learned in CS 106B. Our solution is 11 lines of code long. If you are unsure about the hash functions, we recommend looking at the implementations of insert() and erase().

Once you are done, turn on the two tests for Milestone 1:

  • Test 1A checks the external correctness of the rehash function, by verifying that your HashMap is still functional even if we rehash to various bucket sizes, including extremely large or small values. We also interweave the calls to rehash() with calls to erase() and insert(). Finally, there is a check to see if you correctly throw an exception when rehash(0) is called (this was done for you in the very first line of rehash()).
  • Test 1B roughly checks the internal correctness of the rehash() function. The idea is that buckets with longer linked lists should take longer to search, so we time various calls to contains() to verify that the sizes of the linked lists are roughly correct. This test is fairly rough as it depends on the speed of your computer, but we give it a large enough leeway that you should be able to pass the test. It is fine if this test fails with a small probability.

Milestone 2: Operator Overloading and Const-Correctness

In this section, you will first implement four operators, then make your HashMap const-correct. After implementing an operator, turn on that operator's tests. There is one test for each category of operator, and one for const-correctness.

Indexing ([])

The index operator accepts a key, and returns a reference to its mapped value. The index operator should support auto-insertion—if a key is not found, then a new key/mapped pair is inserted with the given key and a default value of the mapped type.

Stream Insertion (<<)

The stream insertion operator writes the contents of a HashMap into an output stream. The format resembles the format used by the Stanford HashMap, as seen below:

The pairs can be in any order, since this is, after all, an unordered map.
There should be a comma and a space between each pair, but not before the first pair or after the last pair.
HashMap<string, int> map;
cout << map << endl;		// prints "{}"
map.insert({"Anna", 2});
cout << map << endl;		// prints "{Anna:2}"
map.insert({"Avery", 3});
cout << map << endl;		// prints either "{Avery:3, Anna:2}" 
                            // or "{Anna:2, Avery:3}"
The stream insertion operator will need to support chaining.

Equality (==) and Inequality (!=)

Two HashMaps are considered equal if they have the exact same key/mapped pairs. The internal order that they are stored, or the number of buckets, does not matter.

Const Correctness

Finally, make your HashMap class const-correct. You will need to decide which functions (including those part of the starter code) and operators need to be modified.

Hint

For some parts of this milestone, you may find yourself having to add a friend in hashmap.h. This friend may be a template function, and since it's not a member function of the HashMap, we would need to explicitly write out the template declaration for this function in hashmap.h. Note that we can't reuse the same K, M, and H in this template declaration because these template parameters are already in use in the class declaration. Instead, we can use names other than K, M, and H for our function's template declaration.
If you find compiler errors appear after enabling Test 2D, this means that either one of your prototypes is incorrect (eg. wrong const-ness), or your HashMap class is not const-correct and needs to be fixed.

Milestone 3: Implement Special Member Functions

Any good STL-compliant class must have correct special member functions.. Recall that there are six major special member functions:

  • Default constructor (implemented for you)
  • Destructor (implemented for you)
  • Copy constructor
  • Copy assignment operator
  • Move constructor
  • Move assignment operator

The first two are implemented for you; your job is to implement the last four. Specifically, the copy operations should create an identical copy of the provided HashMap, while the move operations should move the contents of the provided HashMap to this. Avoid memory leaks and perform your copy/move as efficiently and safely as possible.

Efficiency Tips

We are requiring you to use the member initializer list (colon after constructor) whenever possible for efficiency.

Test 3A creates copies of your HashMap using the copy constructor and copy assignment operator, and verifies their correctness. In addition, there are tests for edge cases such as self-assignment. Test 3B tries to move HashMaps from one to another using the move constructor and move assignment, and verifies correctness just like test 3A. Note that Test 3B passes even if you only implemented the copy operations (recall that if no move operations are declared, the compiler defaults to the copy operations). Test 3C times your move operations to verify that you are actually moving the contents of the HashMaps rather than copying the contents.

The tests use some compiler directives (#pragma) to silence compiler warnings about self-assignment. You can safely ignore those.

Milestone 4: Answer Short Answer Questions

Answer the following questions in the file named "short_answers.txt". These questions do not require long answers or even complete sentences. We will grade for reasonable correctness and effort—see an example below:

Example

Consider the test case B_move_ctor_assignment. You should see a line with std::move(map1). Then, in the next line, we move from map1 to create move_constructed. Why is this valid?

Answer: std::move marks map eligible to be moved, but does not actually move. A function taking an r-value (eg. move constructor) does the moving. map1 is not passed to such a function, so we can still move from map1.

1. Templates and Template Classes

  1. In the clear() function, the for-each loop contains an auto&. Explain what the deduced type of that auto is and why the ampersand is necessary.
  2. Take a look at the code inside the test F_custom_hash_function() of the file tests.cpp. What is the purpose of decltype, and is there a way to write that line of code without using decltype? Explain.
  3. STL containers store elements of type value_type, and for your HashMap this value_type is a std::pair<const K, M>;. What would be the problem in the HashMap class if value_type were instead std::pair<K, M>?
  4. Take a look at the function check_map_equal() inside tests.cpp. You might notice that in the test cases, we use check_map_equal() to compare a HashMap and a std::map, but never to compare two HashMaps (we instead use operator==). Why can't we use check_map_equal() to compare two HashMaps?

2. Operator Overloading and Const Correctness

  1. Why did you implement both a const and non-const version of at(), but only a non-const version of operator[]? (unlike in Vector where operator[] also had a const version)
  2. Out of all the operators you overloaded, operator<< is the most interesting.
  3. Did you implement operator<< as a member, a friend, or neither? Why?
  4. We had you overload operator<< as an exercise in operator overloading, but none of the STL collections have operator<< overloaded. Why do you think the STL designers made this decision?

3. Special Member Functions and Move Semantics

  1. Consider a class MyClass which has three private members - a size_t, a function object, and a std::vector<int>. In this case the rule of zero applies and you should not write your own special member functions. In your HashMap class the rule of five applies - the starter code had the destructor, and you implemented the other four special member functions. Why does the rule of zero apply to MyClass?
  2. Why does your HashMap class follow the rule of five and not the rule of zero, if your HashMap private members are also a size_t, a function object, and a std::vector?
  3. In your move constructor or move assignment operator, why did you have to std::move each member, even though the parameter (named rhs or other) is already an r-value reference?

4. RAII

  1. Is your HashMap class RAII-compliant? If so, explain. If not, give a specific instance in which this could be a problem, and explain how you would make your class RAII-compliant.

Submitting the Checkpoint

For the checkpoint, you should submit Milestone 2 in two files: hashmap.cpp and hashmap.h. If you worked with a partner, please have one partner submit the assignment. There will be a place on Paperless for you to enter in the other partner's SUNet ID so that you both get credit.

Submitting the Final Project

When you're finished, submit three files: hashmap.cpp, hashmap.h, and your short-answer questions on Paperless! Follow the same instructions for partner work as above.

And finally... Good luck! :)

Extensions: Creating a Fully STL-Compliant HashMap

All of these are optional. Some may require supplemental material or additional readings. We provide an autograder for milestones 5 and 6. These tests are slightly less polished, but if you find any bugs, please let us know!

Milestone 5: Implement an initializer_list constructor and a range constructor.

Add a constructor that allows you to construct your HashMap with an std::initializer_list. Also add a constructor that accepts a range (in the form of two iterators) to initialize your HashMap. This is a fairly easy extension that can be implemented in around 15 lines of code.

Milestone 6: Design an Iterator Class for your HashMap

Using your response to question 7 in Milestone 4, implement an iterator class for your HashMap. Then, add support for const_iterator, and make sure that you can create const iterators, const_iterator, and const const_iterator. To reduce the repeated code, you may need to do some additional reading about type_traits and template metaprogramming. Talk to us if you want some guidance on this. Once you are done, change HashMap.cpp to use iterators. The code is much simpler with iterators!

More ideas (originally these were part of the assignment):

  • Implement iterator based versions of find() and erase() (milestone 7)
  • Implement r-value overloads of various member functions (milestone 8)
  • Incorporate smart pointers into the linked lists (milestone 9)
  • Implement try_emplace(), which will require knowledge of variadic templates and perfect forwarding. (originally milestone 10)
  • Equip the HashMap with a custom memory allocator.
  • Make some functions constexpr, add some compile-time optimizations.
  • Implement a LinkedHashMap, where iterators traverse the map in the order the elements were inserted.
  • Support automatic rehashing when the load factor is too high.
  • And beyond? Every so often, we think of something new that makes our class more complete. The most recent addition is a converting constructor from iterator to const_iterator. If you come up with anything interesting, talk to us. You're done! We enjoyed teaching you this quarter. Thanks for the fun quarter, and have a great Thanksgiving/winter break!