Assign5: PQ Heap


Now you are ready for the pièce de résistance of the assignment — implementing the priority queue as a binary heap.

ADTs for the win

One of the neat things about a C++ class as an abstraction is that you can swap out its innards for a totally different implementation, such to improve the performance or use a more compact representation, while your clients continue on using the same interface unaware and unaffected.

For this task you will implement the PQHeap class using a binary heap. This version of the priority queue has the same operational behavior as the PQSortedArray you worked on earlier, but runs with much higher efficiency. This will allow the priority queue client applications to scale to much higher input sizes than was possible with the much slower PQSortedArray.

The PQHeap class

The PQHeap class has the same public interface as the PQSortedArray:

class PQHeap {
  public:
   
    PQHeap();
    ~PQHeap();
  
    void enqueue(int priority);
    int dequeue();
    int peek() const;
    bool isEmpty() const;
    int size() const;
    void clear();

  private:
};

The private section is left for you to define as part of your implementation.

The pqheap.h file includes documentation of the expected behavior of each operation, including the required handling for error conditions such as attempting to peek an empty queue.

Specification for PQHeap

Rather than repeat the background information of binary heaps, please review the May 13th lecture for the explanation of the data structure and its operations. The rest of this handout assumes familiarity with that content.

The dequeue operation needs to be able to quickly retrieve the frontmost element, but keeping the entire array in fully sorted order as PQSortedArray did is actually a bit of overkill in its approach. The binary heap is a partially-ordered data structure optimized for quick access the frontmost element (and those near the front) and gives less attention to sorting those elements all the way in the back. A binary heap is ordered in a much weaker sense than a sorted array, but its form of ordering is still sufficient for highly efficient performance of the enqueue and dequeue operations.

Here are the details of the specific implementation we require:

  • The data structure is a binary min-heap, stored as a flattened array. In the array, the root element occupies index 1 and index 0 is unused. For a parent element at index i, its two children are at index 2*i and index 2*i + 1.
  • The PQHeap stores integer elements, where the integer itself is the element priority. The smaller value is treated as higher priority and will be dequeued first.
  • You must use a raw array, allocated using new int[]. The array should be allocated of a small initial capacity (10 elements) and capacity should double on each resize.
  • You are responsible for proper allocation and deallocation of memory. Take care to properly deallocate any memory that is no longer is needed, such as when enlarging the array or in the class destructor.
  • The enqueue and dequeue operations must run in O(log N) time where N is the size of the priority queue.
  • Do not use a sort function or library to arrange the elements, nor any collections or algorithms from the STL C++ framework.
  • Do not create any temporary or auxiliary data structures, other than when you are resizing to a larger array on an enqueue operation. Do not use any Stanford collection classes (such as Vector or Map) in your PQHeap implementation.
  • Declare your necessary private member variables in pqheap.h along with any private helper functions. You should not modify the provided public member headers in any way.
  • Implement the bodies of all member functions and constructors in pqheap.cpp.

Advice

  • Debug as you develop. Don't implement all of the functions and try to test the entire class at once! Instead, work on one operation and test thoroughly before moving on. Don't move on until enqueue is working correctly in all necessary cases. When writing dequeue, proceed in the same fashion. You may want to make use of printDebugInfo to dump the internal structure for debugging purposes.
  • Test thoroughly. We have been pushing this all quarter, but it never hurts to repeat it. The code has some complex interactions and it is essential that you take time to identify and test all the various cases. Our provided tests are pretty comprehensive for this project, but you should also add student tests of your own.
  • Careful with deallocation. Memory deallocation can sometimes be a cure worse that the disease, as small memory mistakes can be disproportionately punished. We recommend getting the entire class working correctly without deleting anything, and then go back and carefully add deallocation.

Break the speed barrier

Return to your role as client of the Priority Queue and change the the functions pqSort and topK in pqclient.cpp to use your shiny new PQHeap in place of that pokey old PQSortedArray. The only change required is the type of the variable declaration, let's here for the power of abstraction! Re-run your previous time trial tests, increasing the sizes to a range appropriate for this zippy implementation.

The action of pqSort using PQHeap is akin to the sorting algorithm heapsort, a classic O(NlogN) performer. Streaming top-k should operate in time O(NlogK), using space proportional to K. That some impressive work you've done!

Answer the following questions in short_answer.txt:

  • Q10. Run timing trials and provide your results that confirm that pqSort runs in time O(NlogN) and topK in O(NlogK).

Extensions

Looking to go further with the priority queue? There are lots of directions you could take this project!

  • Add a merge operation. One additional feature often needed when working with priority queues is the ability to merge two priority queues into one combined result. Extend your class to support such a merge member function. A rather uninspired way to do this is to add a loop that dequeues from one and enqueues to another, but ideally, merging should take advantage of the internal implementation to operate efficiently. Add timing trials for this operation and report what efficiency you can achieve for merge operation for each of the different implementations.
  • A double-ended priority queue supports both dequeueMin and dequeueMax operations. Extend your implementation(s) to add this facility. Which implementations can be adapted to provide to efficient access to both the min and max value?
  • Research and implement an alternative priority queue implementation. One of the reasons I love priority queues is because that are so many possible implementation strategies, each with different strengths and weaknesses. Here are a few other data structures you might research (in alphabetical order): binomial heaps, calendar queues, d-ary heaps, Fibonacci heaps, fishspears, leftist heaps, pairing heaps, skew heaps, skip lists, and weak heaps (and there are many others!). Code up one of these alternatives and share with us what you learned in the process.