PQ Heap


Now you are ready for the pièce de résistance — implementing the priority queue as a binary heap. After you complete this task, you will be able to run some really cool demos that we've provided, which show the power of applying your efficient PQueue implementation to analyzing real-world data. But before we can get these demos to work, we need to put in the tough, but rewarding work of implementing our own class from scratch!

ADTs for the win

One of the fantastic features of C++ classes as an abstraction is that you can swap out its inner workings for a totally different implementation, with minimal impact on the client. For example, you can change how your internal algorithms to rev up performance or switch a more compact data representation to save space, all the while your clients continue using the same interface, and receive the benefits of your improvements without lifting a finger.

For this task you will implement the PQHeap class using a binary heap. This version of the priority queue has the same operational behavior as the PQSortedArray that you worked on earlier, but it operates much more efficiently. This will allow priority queue client applications to scale to much larger inputs than was possible with the slower PQSortedArray.

The PQHeap class

The PQHeap class has the same public interface as the PQSortedArray:

class PQHeap {
public:

    PQHeap();
    ~PQHeap();

    void enqueue(DataPoint element);
    DataPoint dequeue();
    DataPoint peek() const;

    bool isEmpty() const;
    int size() const;
    void clear();

    void printDebugInfo();
    void validateInternalState();

private:

    int getParentIndex(int curIndex);
    int getLeftChildIndex(int curIndex);
    int getRightChildIndex(int curIndex);
};

The starting code for the PQHeap declares a few suggested helper functions. The debugging helpers printDebugInfo and validateInternalState are intended to serve a similar purpose as they did in PQSortedArray, e.g. dumping the internal state and verifying internal consistency of the min-heap ordering. The three indexing functions manage the math for computing the parent/child indexes.

While we do not strictly require that you use all of these helper functions, we very strongly recommend it and guarantee that following our recommendation will make your development and debugging processes substantially smoother.

The remainder of the private section (including all member variables) is left for you to define as part of your implementation.

The pqheap.h file includes documentation of the expected behavior of each operation, including the required handling for error conditions, such as attempting to peek or dequeue from an empty queue. You should read through the provided documentation in this file before beginning to code.

Binary heap highlights

For an introduction to the binary heap data structure and its operations, refer to Wednesday's lecture on heaps. The rest of this handout assumes familiarity with that content and does not repeat it.

The dequeue operation needs to be able to quickly retrieve the frontmost element, but keeping the entire array in fully sorted order as PQSortedArray did is actually a bit of overkill in its approach. The binary heap is a partially-ordered data structure optimized for quick access to the frontmost element (and those near the front) and gives less attention to sorting those elements all the way in the back. A binary heap is ordered in a much weaker sense than a sorted array, but its form of ordering is still sufficient for highly efficient performance of the enqueue and dequeue operations.

The binary min-heap you will use should be stored in a flattened array. The root element is placed at index 0. For a given element at index i, its two children (if they exist) are at index 2*i + 1 and index 2*i + 2 and its parent is located at index (i - 1) / 2.

Confirm your understanding of binary heap operations by answering the following questions in short_answer.txt:

Q12. Start with an empty binary heap and enqueue the nine DataPoints in the order shown below and show the result. You only need to show the final heap, not intermediate steps. Draw the heap as tree-like diagram with root element on top, its two children below, and so on. Yes, we know that we're asking you to draw pictures in a text file (we love the AsciiFlow tool for "drawing" in text).

{ "R", 4 }
{ "A", 5 }
{ "B", 3 }
{ "K", 7 }
{ "G", 2 }
{ "V", 9 }
{ "T", 1 }
{ "O", 8 }
{ "S", 6 }

Confirm for yourself that while there are many possible arrangements of these elements into a valid binary heap, there is only one that will result from enqueueing these values in this particular order.

Q13. Make two calls to dequeue on the above binary heap and draw the updated result.

Q14. Draw the array representation of the binary heap above. Label each element with its array index.

Implementation requirements

  • Declare your necessary private member variables in pqheap.h along with any private helper functions. Do not modify the provided public member prototypes in any way.
  • Implement the bodies of all member functions and the constructor/destructor in pqheap.cpp. Empty implementation skeletons have been provided there for you.
  • The underlying data organization technique is a binary min-heap, stored as a flattened array. The root of the heap is stored at index 0.
  • The array elements are of type DataPoint, which packages a string name with an integer priority. A smaller integer value for priority indicates a more urgent priority than a larger integer value. The DataPoint with the minimum integer value for priority is treated as the most urgent and is the one retrieved by peek/dequeue.
  • You must use a dynamically-allocated array as your underlying data storage mechanism. The array should be allocated with a small initial capacity (10 elements, defined as a constant in the code) and capacity should double on each resize operation, when necessary.
  • The class is responsible for proper allocation and deallocation of dynamic memory. Take care to properly deallocate any memory that is no longer is needed, such as when outgrowing a piece of memory or in the class destructor.
  • The enqueue and dequeue operations must run in O(log N) time where N is the count of elements in the priority queue. The bubble operations can operate either iteratively or recursively, your choice.
  • Your PQHeap implementation must not use any library sorting routine nor use any classes or algorithms from the C++ STL framework. It also must not use any Stanford collection classes (such as Vector or Map).

Strategic advice

(Re)-read our advice on debugging memory errors to remind yourself of the need to bring your A-game.

Do not write all functions and try to test the entire class in one big hairy mess. Pick a single task to work on in isolation and test as you go. Here is a suggested plan of attack:

  • First write the helper functions, the most important of which is validateInternalState. Investing in a solid internal watchdog now will pay off in much saved grief later.
  • The indexing helpers are a clean way to abstract out the slightly messy calculation of the indexes. How about also including a check to stop your code from wrongly attempting to access an index that is out of bounds? (As you may remember from the warmup, there is no error detection from C++ itself about such mistakes, so designing in your own vigilance is good!)
  • When ready to start on the public operations, choose an order to implement that is conducive to testing. Although simple operations like size or peek seem appealing as a starting point, they would be tough to test without first having a working enqueue. In fact, since most everything depends on enqueue, that seems like your likely first step.
  • Break down enqueue into its subtasks, perhaps first insert into array that doesn't yet expand. When you get to a needed step like swap or bubble, bust out a private helper function for it.
  • Implement and test each operation like crazy. Hand-construct small targeted test cases. Trace on paper, then confirm the correctness of your code using debugger and validate function.
  • After the basic enqueue works, extend to support expanding the array. Follow with more testing on a wide range of inputs, including at scale.
  • Now is a good time to implement the simpler operations, like size, peek and clear. Test!
  • Then you're ready for dequeue, the final big task. Some additional private helpers may be worthwhile. Tracing and testing, use your validate watchdog and debugger.
  • With all operations individually vetted, you're up to testing interleaved operations and stress-sized inputs in your final push. Congratulations!

Writing your own tests

We never tire of repeating ourselves on the importance of thorough testing. The more care that you put into your testing strategy and thoughtful choices in test cases, the higher your confidence that your code is truly bug-free. The binary heap code has many complex interactions, and sneaky bugs can lurk in the shadows, it will take your finest testing to shake those out into the light of day.

  • We want you in the driver's seat on testing your PQHeap class and have intentionally started you with very few provided tests. You will need to add many student tests of your own.
  • Review the test coverage of PQSortedArray as a starting point. You are welcome to re-use any of those tests (both the provided ones and your own tests). Copy in a test case and make whatever modifications are needed to adapt from PQSortedArray to PQHeap. For each test you borrow, take the time to understand what exactly it is testing, as well as what it is not being tested. It is those omissions for which you need to devise additional test cases to ensure your coverage is fully comprehensive.
  • Many of our provided tests use the empty string as the name field of the DataPoint, since we only care about the priority value. Feel free to do the same in your own tests.

Break the speed barrier

Finally, jump back to your role as client and open up pqclient.cpp. Edit the functions pqSort and topK to use your shiny new PQHeap in place of that pokey old PQSortedArray. The only change required is the type of the variable declaration and like magic, these algorithms are now turbo-charged.  Let's hear it for the power of abstraction! Re-run your previous time trial tests, increasing the sizes to a range appropriate for this zippy implementation.

The action of pqSort using PQHeap is akin to the sorting algorithm heapsort, a classic O(NlogN) performer. Streaming top-k should operate in time O(NlogK), using space proportional to K. That's some impressive work you've done!

Answer the following questions in short_answer.txt:

Q15. Run timing trials and provide your results that confirm that pqSort runs in time O(NlogN) and topK in O(NlogK).

Extensions

Looking to go further with the priority queue? There are lots of directions you could take this project!

  • Add a merge operation. One additional feature often needed when working with priority queues is the ability to merge two priority queues into one combined result. Extend your class to support such a merge member function. An uninspired way to do this is to add a loop that dequeues from one and enqueues to another, but merging should instead take advantage of the internal implementation to operate efficiently. Add timing trials for this operation and report what efficiency you can achieve for merge operation for each of the different implementations.
  • A double-ended priority queue supports both dequeueMin and dequeueMax operations. Extend your implementation(s) to add this facility. Which implementations can be adapted to provide to efficient access to both the min and max value?
  • Research and implement an alternative priority queue implementation. One of the reasons we love priority queues is because that are so many possible implementation strategies, each with different strengths and weaknesses. Here are a few other data structures you might research (in alphabetical order): binomial heaps, calendar queues, d-ary heaps, Fibonacci heaps, fishspears, leftist heaps, pairing heaps, skew heaps, skip lists, and weak heaps (and there are many others!). Code up one of these alternatives and share with us what you learned in the process.