Assign4: PQ Heap


Now you are ready for the pièce de résistance of the assignment — implementing the priority queue on top of a binary heap. After you complete this portion of the assignment, you will be able to run a number of really cool demos that we've provided, which show the power of applying your efficient PQueue implementation to analyzing real-world data. But before we can get these demos to work, we need to put in the tough, but rewarding work of implementing our own class from scratch!

ADTs for the win

One of the neat things about a C++ class as an abstraction is that you can swap out its inner workings for a totally different implementation, with relatively minimal impact on the client. For example, you can change up how you internally store and organize the data to improve performance or to use a more compact data representation structure, while your clients can continue using the same interface, unaware and unaffected.

For this task you will implement the PQHeap class using a binary heap. This version of the priority queue has the same operational behavior as the PQSortedArray that you worked on earlier, but it runs with much higher efficiency. This will allow the priority queue client applications that you just implemented to effectively scale to much higher input sizes than was possible with the slower, less efficient PQSortedArray.

The PQHeap class

The PQHeap class has the same public interface as the PQSortedArray:

class PQHeap {
public:

    PQHeap();
    ~PQHeap();

    void enqueue(DataPoint element);
    DataPoint dequeue();
    DataPoint peek() const;

    bool isEmpty() const;
    int size() const;
    void clear();

    void printDebugInfo();

private:

    void validateInternalState();
    int getParentIndex(int curIndex);
    int getLeftChildIndex(int curIndex);
    int getRightChildIndex(int curIndex);
};

In the private section, we have provided declarations of four very important helper functions that we strongly recommend implementing. The validateInternalState function will work similarly to what you saw in the first part of this assignment, except now it will be responsible for enforcing that the min-heap property holds true for all parent/children pairs in the heap. The three indexing functions are provided as a good, clean way to abstract away the slightly hairy math necessary to find the parent/child indices for a given element. Again, while we will not strictly require you to, we strongly recommend implementing and using these helper functions – it will make your development and debugging processes substantially smoother. The remainder of the private section (including all member variables) is left for you to define as part of your implementation.

The pqheap.h file includes documentation of the expected behavior of each operation, including the required handling for error conditions, such as attempting to peek or dequeue from an empty queue. You should read through the provided documentation in this file before beginning your implementation.

A review of binary heaps

Rather than repeat the background information on binary heaps here, please review the July 21st lecture for the explanation of the data structure and its operations. The rest of this handout assumes familiarity with that content.

The dequeue operation needs to be able to quickly retrieve the frontmost element, but keeping the entire array in fully sorted order as PQSortedArray did is actually a bit of overkill in its approach. The binary heap is a partially-ordered data structure optimized for quick access to the frontmost element (and those near the front) and gives less attention to sorting those elements all the way in the back. A binary heap is ordered in a much weaker sense than a sorted array, but its form of ordering is still sufficient for highly efficient performance of the enqueue and dequeue operations.

The binary min-heap is stored as a flattened array. In the array, the root element occupies index 0. For an element at index i, its two children (if they exist) are at index 2*i + 1 and index 2*i + 2. If the element has a parent, the parent is located at index (i - 1) / 2.

To make sure you understand how binary heaps and their operations work, answer the following questions in short_answer.txt:

  • Q9. Draw the binary heap formed by inserting the following nine DataPoints into an empty binary heap. Specifically, insert those data points in the other that's shown below. You only need to show your final answer, not your intermediate steps. Yes, we know that we're asking you to draw pictures in a text file (so here's a useful tool for "drawing" the binary heaps).
{ "R", 4 }
{ "A", 5 }
{ "B", 3 }
{ "K", 7 }
{ "G", 2 }
{ "V", 9 }
{ "T", 1 }
{ "O", 8 }
{ "S", 6 }

There are many binary heaps you can form that contain these elements, but only one of them will be what you get by tracing the algorithm.

  • Q10. Draw the binary heap that results after two calls to the dequeue operation on the heap you drew in Q9.
  • Q11. Draw the array representation of the binary heap that you drew in Q10.

Specification for PQHeap

Here are the details of the specific implementation we require:

  • The underlying data organization technique is a binary min-heap, stored as a flattened array. The array should follow the indexing scheme explained in the section above.
  • The PQHeap stores elements of type DataPoint, which stores some data as a string and a priority as an integer representing the element's priority in the data structure. As before, numerically smaller values of the priority field are treated as "higher" priority and must be dequeued first.
  • You must use a dynamically-allocated array as your underlying data storage mechanism. The array should be allocated with a small initial capacity (10 elements, defined as a constant in the code) and capacity should double on each resize operation, when necessary.
  • You are responsible for proper allocation and deallocation of memory. Take care to properly deallocate any memory that is no longer is needed, such as when enlarging the array or in the class destructor.
  • The enqueue and dequeue operations must run in O(log N) time where N represents the number of elements currently stored in the priority queue.
  • Do not use a sort function or library to arrange the elements, nor any collections or algorithms from the STL C++ framework.
  • The C++ standard libraries contain functions std::push_heap and std::pop_heap. For the purposes of this assignment, please refrain from using those functions.
  • Do not create any temporary or auxiliary data structures, other than when you are resizing to a larger array on an enqueue operation. Do not use any Stanford collection classes (such as Vector or Map) in your PQHeap implementation.
  • Declare your necessary private member variables in pqheap.h along with any private helper functions. You should not modify the provided public member headers in any way.
  • Implement the bodies of all member functions and the constructor/destructor in pqheap.cpp. Empty implementation skeletons have been provided there for you.

Advice

  • Debug as you develop. Don't implement all of the functions and try to test the entire class at once! Instead, work on one operation and test thoroughly before moving on. Don't move on until enqueue is working correctly in all necessary cases. When writing dequeue, proceed in the same fashion. You may want to make use of printDebugInfo to dump the internal structure for debugging purposes.
  • Test thoroughly. We have been pushing this all quarter, but it never hurts to repeat it. The code has some complex interactions, and it is essential that you take time to identify and test all the various cases. Our provided tests are pretty comprehensive for this project, but you should also add student tests of your own.
  • Careful with deallocation. Memory deallocation can sometimes be a cure worse that the disease, as small memory mistakes can be disproportionately punished. We recommend getting the entire class working correctly without deleting anything, and then go back and carefully add deallocation.

Break the speed barrier

Return to your role as client of the Priority Queue and change the the functions pqSort and topK in pqclient.cpp to use your shiny new PQHeap in place of that pokey old PQSortedArray. The only change required is the type of the variable declaration – let's hear it for the power of abstraction! Re-run your previous time trial tests, increasing the sizes to a range appropriate for this zippy implementation.

The action of pqSort using PQHeap is akin to the sorting algorithm heapsort, a classic O(NlogN) performer. Streaming top-k should operate in time O(NlogK), using space proportional to K. That's some impressive work you've done!

Answer the following questions in short_answer.txt:

  • Q12. Run timing trials and provide your results that confirm that pqSort runs in time O(NlogN) and topK in O(NlogK).

Extensions

Looking to go further with the priority queue? There are lots of directions you could take this project!

  • Add a merge operation. One additional feature often needed when working with priority queues is the ability to merge two priority queues into one combined result. Extend your class to support such a merge member function. A rather uninspired way to do this is to add a loop that dequeues from one and enqueues to another, but ideally, merging should take advantage of the internal implementation to operate efficiently. Add timing trials for this operation and report what efficiency you can achieve for merge operation for each of the different implementations.
  • A double-ended priority queue supports both dequeueMin and dequeueMax operations. Extend your implementation(s) to add this facility. Which implementations can be adapted to provide to efficient access to both the min and max value?
  • Research and implement an alternative priority queue implementation. One of the reasons we love priority queues is because that are so many possible implementation strategies, each with different strengths and weaknesses. Here are a few other data structures you might research (in alphabetical order): binomial heaps, calendar queues, d-ary heaps, Fibonacci heaps, fishspears, leftist heaps, pairing heaps, skew heaps, skip lists, and weak heaps (and there are many others!). Code up one of these alternatives and share with us what you learned in the process.