PQ Heap


Assignment written by Julie Zelenski

ADTs for the win

Now you are ready for the pièce de résistance — implementing the priority queue as a binary heap. One of the fantastic features of C++ classes as an abstraction is that you can swap out its inner workings for a totally different implementation, with minimal impact on the client. For example, you can change your internal algorithms to rev up performance or switch a more compact data representation to save space, all the while your clients continue using the same interface, and receive the benefits of your improvements without lifting a finger.

For this task you will implement the PQHeap class using a binary heap. This version of the priority queue has the same operational behavior as the PQArray that you worked on earlier, but it operates much more efficiently. This will allow priority queue client applications to scale to much larger inputs than was possible with the slower PQArray.

The PQHeap class

The PQHeap class has the same public interface as the PQArray:

class PQHeap {
public:

    PQHeap();
    ~PQHeap();

    void enqueue(DataPoint element);
    DataPoint dequeue();
    DataPoint peek() const;

    bool isEmpty() const;
    int size() const;
    void clear();

    void printDebugInfo(string msg) const;
    void validateInternalState() const;

private:

    int getParentIndex(int child) const;
    int getLeftChildIndex(int parent) const;
    int getRightChildIndex(int parent) const;
};

The starting code for the PQHeap declares a few suggested helper functions. The debugging helpers printDebugInfo and validateInternalState are intended to serve a similar purpose as they did in PQArray, e.g. dumping the internal state and verifying internal consistency of the min-heap ordering. The three private indexing functions manage the math for computing the parent/child indexes.

While we do not strictly require that you use all of these helper functions, we very strongly recommend it and guarantee that following our recommendation will make your development and debugging processes substantially smoother.

The remainder of the private section (including all member variables) is left for you to define as part of your implementation.

The pqheap.h file includes documentation of the expected behavior of each operation, including the required handling for error conditions, such as attempting to peek or dequeue from an empty queue. You should read through the provided documentation in this file before beginning to code.

Binary heap highlights

For an introduction to the binary heap data structure and its operations, refer to the lecture on heaps. The rest of this handout assumes familiarity with that content and does not repeat it.

The dequeue operation needs to be able to quickly retrieve the frontmost element, but keeping the entire array in fully sorted order as PQArray did is actually a bit of overkill in its approach. The binary heap is a partially-ordered data structure optimized for quick access to the frontmost element (and those near the front) and gives less attention to sorting those elements all the way in the back. A binary heap is ordered in a much weaker sense than a sorted array, but its form of ordering is still sufficient for highly efficient performance of the enqueue and dequeue operations.

The binary min-heap you will use should be stored in a flattened array. The root element is placed at index 0. For a given element at index i, its two children (if they exist) are at index 2*i + 1 and index 2*i + 2 and its parent is located at index (i - 1) / 2.

Confirm your understanding of binary heap operations by answering the following questions in short_answer.txt:

Q11. Start with an empty binary heap and enqueue the nine DataPoints in the order shown below and show the result. You only need to show the final heap, not intermediate steps. Draw the heap as tree-like diagram with root element on top, its two children below, and so on. Yes, we know that we're asking you to draw pictures in a text file (we love the AsciiFlow tool for "drawing" in text).

{ "R", 4 }
{ "A", 5 }
{ "B", 3 }
{ "K", 7 }
{ "G", 2 }
{ "V", 9 }
{ "T", 1 }
{ "O", 8 }
{ "S", 6 }

Confirm for yourself that while there are many possible arrangements of these elements into a valid binary heap, there is only one that will result from enqueueing these values in this particular order.

Q12. Make two calls to dequeue on the above binary heap and draw the updated result.

Q13. Draw the array representation of the binary heap above. Label each element with its array index.

Implementation requirements

  • Declare your necessary private member variables in pqheap.h along with any private helper functions. Do not modify the provided public member prototypes in any way.
  • Implement the bodies of all member functions and the constructor/destructor in pqheap.cpp. Empty implementation skeletons have been provided there for you.
  • The underlying data organization technique is a binary min-heap, stored as a flattened array. The root of the heap is stored at index 0.
  • The array elements are of type DataPoint, which packages a string label with a numeric priority. A smaller value for priority indicates a more urgent priority than a larger value. The DataPoint with the minimum double value for priority is treated as the most urgent and is the one retrieved by peek/dequeue.
  • You must use a dynamically-allocated array as your underlying data storage mechanism. The array should be allocated with a small initial capacity (10 elements, defined as a constant in the code) and capacity should double on each resize operation, when necessary.
  • The class is responsible for proper allocation and deallocation of dynamic memory. Take care to properly deallocate any memory that is no longer is needed, such as when outgrowing a piece of memory or in the class destructor.
  • The enqueue and dequeue operations must run in O(log N) time where N is the count of elements in the priority queue. The bubble operations can operate either iteratively or recursively, your choice.
  • Your PQHeap implementation must not use any library sorting routine nor use any classes or algorithms from the C++ STL framework. It also must not use any Stanford collection classes (such as Vector or Map).

Strategic advice

(Re)-read our advice on debugging memory errors to remind yourself of the need to bring your A-game.

Do not write all functions and try to test the entire class in one big mess. Pick a single task to work on in isolation and test as you go. Here is a suggested plan of attack:

  • Start with a careful reading of the code for PQArray. Be sure to review the provided code and tests, as well as the code and tests that you added. Both PQHeap and PQArray implement the same Priority Queue abstraction and both use a dynamic array under the hood. You will find that some PQHeap operations can reuse the same or similar code as PQArray, while other operations will need a distinct implementation that is specific to the binary heap. Determine which portions of the code and tests can be usefully incorporated into PQHeap and bring those in now.
  • We recommend you begin with the most important of the helper functions: validateInternalState. Writing this function will prompt you to think through the structure of a valid binary heap and write the code to confirm the object's state is internally consistent. Investing in a solid internal watchdog now will pay off in much saved grief later.
  • The indexing helpers are a clean way to abstract out the slightly messy calculation of the indexes. Implement the helpers to return a distinctive sentinel value to indicate if the requested parent or child index is not valid (i.e. trying to get parent of the root index, or access child of index that has no children). This sentinel will simplify the stopping conditions for your bubble up/down operation.
  • When ready to implement the swapElements helper, be sure to check out and reuse the provided version in PQArray.
  • When ready to start on the public operations, choose an order to implement that is conducive to testing. Although simple operations like size or peek seem appealing as a starting point, they would be tough to test without first having a working enqueue. In fact, since most everything depends on enqueue, that seems like your likely first step.
  • Break down enqueue into its subtasks, perhaps first insert into an array that doesn't yet expand. When you get to a needed step like swapElements or bubble, bust out a private helper function for it.
  • Implement and test each operation like crazy. Hand-construct small targeted test cases. Trace on paper, then confirm the correctness of your code using the debugger.
  • After the basic enqueue works, extend to support expanding the array. Follow with more testing on a wide range of inputs, including at scale.
  • Now is a good time to implement the simpler operations, like size, peek and clear. Test!
  • Then you're ready for dequeue, the final big task. Some additional private helpers may be worthwhile. More tracing and testing; use your validate watchdog and debugger.
  • With all operations individually vetted, you're up to testing interleaved operations and stress-sized inputs in your final push. Congratulations!

Writing your own tests

We never tire of repeating ourselves on the importance of thorough testing. The more care that you put into your testing strategy and thoughtful choices in test cases, the higher your confidence that your code is truly bug-free. The binary heap code has many complex interactions, and sneaky bugs can lurk in the shadows. It will take your finest testing to shake those out into the light of day.

  • We want you in the driver's seat on testing your PQHeap class and have intentionally started you with very few provided tests. You will need to add many meaningful student tests of your own. What most matters for test coverage is not simply quantity, but quality.
  • Review the test coverage of PQArray as a starting point. You are welcome to re-use any of those tests (both the provided ones and your own tests). Copy in a test case and make whatever modifications are needed to adapt from PQArray to PQHeap. For each test you borrow, take the time to understand what exactly it is testing, as well as what it is not being tested. It is those omissions for which you need to devise additional test cases to ensure your coverage is fully comprehensive.
  • You must include at least two meaningful student tests for every function to ensure that the coverage is robust and comprehensive (you will likely need more than two tests to ensure that your implementation is fully correct). Each test case should include a comment which documents what this test case is targeting and how it complements the rest of your tests.
  • Many of our provided tests use the empty string as the label field of the DataPoint, since we only care about the priority value. Feel free to do the same in your own tests.

Break the speed barrier

Finally, jump back to your role as client and open up pqclient.cpp. Edit the functions pqSort and topK to use your shiny new PQHeap in place of that pokey old PQArray. The only change required is the type of the variable declaration and like magic, these algorithms are now turbo-charged.  Let's hear it for the power of abstraction! Re-run your previous time trial tests, increasing the sizes to a range appropriate for this zippy implementation.

The action of pqSort using PQHeap is akin to the sorting algorithm heapsort, a classic O(NlogN) performer. Streaming top-k should operate in time O(NlogK), using space proportional to K. That's some impressive work you've done!

Answer the following questions in short_answer.txt:

Q14. Re-run the timing trials on pqclient.cpp and provide your results that confirm that pqSort runs in time O(NlogN) and topK in O(NlogK).

Extensions

Looking to go further with the priority queue? There are lots of directions you could take this project!

  • Add a merge operation. One additional feature often needed when working with priority queues is the ability to merge two priority queues into one combined result. Extend your class to support such a merge member function. An uninspired way to do this is to add a loop that dequeues from one and enqueues to another, but merging should instead take advantage of the internal implementation to operate efficiently. Add timing trials for this operation and report what efficiency you can achieve for merge operation for each of the different implementations.
  • A double-ended priority queue supports both dequeueMin and dequeueMax operations. Extend your implementation(s) to add this facility. Which implementations can be adapted to provide to efficient access to both the min and max value?
  • Research and implement an alternative priority queue implementation. One of the reasons we love priority queues is because that are so many possible implementation strategies, each with different strengths and weaknesses. Here are a few other data structures you might research (in alphabetical order): binomial heaps, calendar queues, d-ary heaps, Fibonacci heaps, fishspears, leftist heaps, pairing heaps, skew heaps, skip lists, and weak heaps (and there are many others!). Code up one of these alternatives and share with us what you learned in the process.