Assign4: PQ Heap
Now you are ready for the pièce de résistance of the assignment — implementing the priority queue on top of a binary heap. After you complete this portion of the assignment, you will be able to run a number of really cool demos that we've provided, which show the power of applying your efficient PQueue implementation to analyzing real-world data. But before we can get these demos to work, we need to put in the tough, but rewarding work of implementing our own class from scratch!
ADTs for the win
One of the neat things about a C++ class as an abstraction is that you can swap out its inner workings for a totally different implementation, with relatively minimal impact on the client. For example, you can change up how you internally store and organize the data to improve performance or to use a more compact data representation structure, while your clients can continue using the same interface, unaware and unaffected.
For this task you will implement the PQHeap class using a binary heap. This version of the priority queue has the same operational behavior as the PQSortedArray that you worked on earlier, but it runs with much higher efficiency. This will allow the priority queue client applications that you just implemented to effectively scale to much higher input sizes than was possible with the slower, less efficient PQSortedArray.
The PQHeap class
The PQHeap class has the same public interface as the PQSortedArray:
class PQHeap {
public:
PQHeap();
~PQHeap();
void enqueue(DataPoint element);
DataPoint dequeue();
DataPoint peek() const;
bool isEmpty() const;
int size() const;
void clear();
void printDebugInfo();
void validateInternalState();
private:
int getParentIndex(int curIndex);
int getLeftChildIndex(int curIndex);
int getRightChildIndex(int curIndex);
};
In the private section, we have provided declarations of four very important helper functions that we strongly recommend implementing. The validateInternalState function will work similarly to what you saw in the first part of this assignment, except now it will be responsible for enforcing that the min-heap property holds true for all parent/children pairs in the heap. The three indexing functions are provided as a good, clean way to abstract away the slightly hairy math necessary to find the parent/child indices for a given element. Again, while we will not strictly require you to, we strongly recommend implementing and using these helper functions – it will make your development and debugging processes substantially smoother. The remainder of the private section (including all member variables) is left for you to define as part of your implementation.
The pqheap.h file includes documentation of the expected behavior of each operation, including the required handling for error conditions, such as attempting to peek or dequeue from an empty queue. You should read through the provided documentation in this file before beginning your implementation.
A review of binary heaps
Rather than repeat the background information on binary heaps here, please review the July 21st lecture for the explanation of the data structure and its operations. The rest of this handout assumes familiarity with that content.
The dequeue operation needs to be able to quickly retrieve the frontmost element, but keeping the entire array in fully sorted order as PQSortedArray did is actually a bit of overkill in its approach. The binary heap is a partially-ordered data structure optimized for quick access to the frontmost element (and those near the front) and gives less attention to sorting those elements all the way in the back. A binary heap is ordered in a much weaker sense than a sorted array, but its form of ordering is still sufficient for highly efficient performance of the enqueue and dequeue operations.
The binary min-heap is stored as a flattened array. In the array, the root element occupies index 0. For an element at index i, its two children (if they exist) are at index 2*i + 1 and index 2*i + 2. If the element has a parent, the parent is located at index (i - 1) / 2.
To make sure you understand how binary heaps and their operations work, answer the following questions in short_answer.txt:
- Q12. Draw the binary heap formed by inserting the following nine
DataPoints into an empty binary heap. Specifically, insert those data points in the other that's shown below. You only need to show your final answer, not your intermediate steps. Yes, we know that we're asking you to draw pictures in a text file (so here's a useful tool for "drawing" the binary heaps).
{ "R", 4 }
{ "A", 5 }
{ "B", 3 }
{ "K", 7 }
{ "G", 2 }
{ "V", 9 }
{ "T", 1 }
{ "O", 8 }
{ "S", 6 }
There are many binary heaps you can form that contain these elements, but only one of them will be what you get by tracing the algorithm.
- Q13. Draw the binary heap that results after two calls to the
dequeueoperation on the heap you drew in Q9. - Q14. Draw the array representation of the binary heap that you drew in Q12.
Specification for PQHeap
Here are the details of the specific implementation we require:
- The underlying data organization technique is a binary min-heap, stored as a flattened array. The array should follow the indexing scheme explained in the section above.
- The
PQHeapstores elements of typeDataPoint, which stores some data as a string and a priority as an integer representing the element's priority in the data structure. As before, numerically smaller values of the priority field are treated as "higher" priority and must be dequeued first. - You must use a dynamically-allocated array as your underlying data storage mechanism. The array should be allocated with a small initial capacity (10 elements, defined as a constant in the code) and capacity should double on each resize operation, when necessary.
- You are responsible for proper allocation and deallocation of memory. Take care to properly deallocate any memory that is no longer is needed, such as when enlarging the array or in the class destructor.
- The
enqueueanddequeueoperations must run inO(log N)time whereNrepresents the number of elements currently stored in the priority queue. - Do not use a
sortfunction or library to arrange the elements, nor any collections or algorithms from the STL C++ framework. - The C++ standard libraries contain functions
std::push_heapandstd::pop_heap. For the purposes of this assignment, please refrain from using those functions. - Do not create any temporary or auxiliary data structures, other than when you are resizing to a larger array on an enqueue operation. Do not use any Stanford collection classes (such as Vector or Map) in your PQHeap implementation.
- Declare your necessary private member variables in pqheap.h along with any private helper functions. You should not modify the provided public member headers in any way.
- Implement the bodies of all member functions and the constructor/destructor in
pqheap.cpp. Empty implementation skeletons have been provided there for you.
Advice
Do not write all functions and try to test the entire class in one big hairy mess. Pick a single task to work on in isolation and test as you go. Here is a suggested plan of attack:
- First write the helper functions, the most important of which is
validateInternalState. Investing in a solid internal watchdog now will pay off in much saved grief later. - The indexing helpers are a clean way to abstract out the slightly messy calculation of the indexes. How about also including a check to stop your code from wrongly attempting to access an index that is out of bounds? (As you may remember from the warmup, there is no error detection from C++ itself about such mistakes, so designing in your own vigilance is good!)
- When ready to start on the public operations, choose an order to implement that is conducive to testing. Although simple operations like
sizeorpeekseem appealing as a starting point, they would be tough to test without first having a workingenqueue. In fact, since most everything depends onenqueue, that seems like your likely first step. - Break down
enqueueinto its subtasks, perhaps first insert into array that doesn't yet expand. When you get to a needed step likeswaporbubble, bust out a private helper function for it. - Implement and test each operation like crazy. Hand-construct small targeted test cases. Trace on paper, then confirm the correctness of your code using debugger and validate function.
- After the basic
enqueueworks, extend to support expanding the array. Follow with more testing on a wide range of inputs, including at scale. - ow is a good time to implement the simpler operations, like
size,peekandclear. Test! - Then you're ready for
dequeue, the final big task. Some additional private helpers may be worthwhile. Tracing and testing, use your validate watchdog and debugger. - With all operations individually vetted, you're up to testing interleaved operations and stress-sized inputs in your final push. Congratulations!
Writing your own tests
We never tire of repeating ourselves on the importance of thorough testing. The more care that you put into your testing strategy and thoughtful choices in test cases, the higher your confidence that your code is truly bug-free. The binary heap code has many complex interactions, and sneaky bugs can lurk in the shadows, it will take your finest testing to shake those out into the light of day.
- We want you in the driver's seat on testing your
PQHeapclass and have intentionally started you with very few provided tests.You will need to add many student tests of your own. - Review the test coverage of
PQSortedArrayas a starting point. You are welcome to re-use any of those tests (both the provided ones and your own tests). Copy in a test case and make whatever modifications are needed to adapt fromPQSortedArraytoPQHeap. For each test you borrow, take the time to understand what exactly it is testing, as well as what it is not being tested. It is those omissions for which you need to devise additional test cases to ensure your coverage is fully comprehensive. - Many of our provided tests use the empty string as the
namefield of theDataPoint, since we only care about the priority value. Feel free to do the same in your own tests.
Break the speed barrier
Return to your role as client of the Priority Queue and change the the functions pqSort and topK in pqclient.cpp to use your shiny new PQHeap in place of that pokey old PQSortedArray. The only change required is the type of the variable declaration – let's hear it for the power of abstraction! Re-run your previous time trial tests, increasing the sizes to a range appropriate for this zippy implementation.
The action of pqSort using PQHeap is akin to the sorting algorithm heapsort, a classic O(NlogN) performer. Streaming top-k should operate in time O(NlogK), using space proportional to K. That's some impressive work you've done!
Answer the following questions in short_answer.txt:
- Q15. Run timing trials and provide your results that confirm that pqSort runs in time O(NlogN) and topK in O(NlogK).
Extensions
Looking to go further with the priority queue? There are lots of directions you could take this project!
- Add a merge operation. One additional feature often needed when working with priority queues is the ability to merge two priority queues into one combined result. Extend your class to support such a merge member function. A rather uninspired way to do this is to add a loop that dequeues from one and enqueues to another, but ideally, merging should take advantage of the internal implementation to operate efficiently. Add timing trials for this operation and report what efficiency you can achieve for merge operation for each of the different implementations.
- A double-ended priority queue supports both
dequeueMinanddequeueMaxoperations. Extend your implementation(s) to add this facility. Which implementations can be adapted to provide to efficient access to both the min and max value? - Research and implement an alternative priority queue implementation. One of the reasons we love priority queues is because that are so many possible implementation strategies, each with different strengths and weaknesses. Here are a few other data structures you might research (in alphabetical order): binomial heaps, calendar queues, d-ary heaps, Fibonacci heaps, fishspears, leftist heaps, pairing heaps, skew heaps, skip lists, and weak heaps (and there are many others!). Code up one of these alternatives and share with us what you learned in the process.