Assign4: PQ Sorted Array


An Introduction to Priority Queues

A standard queue processes elements in the first-in, first-out ("FIFO") manner typical of ordinary waiting lines. New elements are added to the back of the queue and the next element to be processed is taken from the front. Queues can be handy, but a FIFO strategy isn't always what's needed.

A hospital emergency room, for example, needs to schedule patients according to priority. A patient with a more critical problem will pre-empt others that arrived earlier, allowing their problems to be addressed earlier. The data structure that we need in order to enable this behavior is a priority queue. The highest priority element in the queue is considered frontmost and will be dequeued first. As we will see later on in the assignment there are many practical applications for a priority queue, both inside and outside of computer science!

Defining the PQSortedArray class

In order to provide this desired functionality, we turn to harness the power of classes in C++ and define our own ADT class called PQSortedArray. This class will be defined and used much in the same way that we have seen so far with collection classes like the Queue, Vector, and Set. The only difference is that this time around, instead of relying on the Stanford libraries to define the behavior of these classes, we take the power into our own hands! The whole purpose of designing a class in this case is to hide the messy details (keeping the elements in sorted order, always removing the one with highest priority, etc.) inside the implementation and to present a clean abstraction for use by the client.

While our eventual goal will be to implement a binary heap (covered in Tuesday's lecture) as the data organization strategy for our underlying Priority Queue implementation, we are going to start out by using a slightly more straighforward underlying data organization approach. This will get us warmed up and thinking about the intricacies of class interfaces and dynamic memory and will help get your hands dirty with the nitty gritty of working with data stored in arrays.

In particular, we are going to choose to store our elements in an array, while enforcing the property that the elements will always be in sorted order by priority. In this way, it is quick to find the frontmost element that should be dequeued, as it will always be either the first or the last element in the array, depending on whether we store the elements in increasing or decreasing order of priority. Thus, the class that we will be implementing in this portion of the exercise is called PQSortedArray.

Below, we have provided the interface for the PQSortedArray class. Note that the public interface of the Priority Queue class is identical to that of the standard Queue. The only operational difference is that the highest priority element is treated as frontmost for peek/dequeue rather than the element that was enqueued first. You should also note that the private member variables of the class look very similar to the private member variables that we defined in class on Monday when implementing the OurVector class. For this reason, we strongly recommend making sure that you've watched and understood Monday's lecture before proceeding.

In an ideal world, we would be able to write this class in such a way that it would be able to hold generic data types that could be specified by the client, as is allowed by the Stanford C++ libraries (that is, you get to choose whether the data structure holds int, double, string, etc). However, we will not cover how to do so in this class, as it is a little bit beyond the scope of what we've learned. Thus, we are going to make the executive decision that our priority queue will store instances of the DataPoint struct, which bundles together a string holding the "data" we want to store and an integer representing the priority of that data. We introduced the DataPoint struct in the warmup exercise, but we've reproduced the struct declaration below (and you can also find the declaration and extensive comments in datapoint.h).

struct DataPoint {
    string name;
    int priority;
}

Important Note: The priority member of the struct represents the data point's priority. In the whole assignment, smaller integer values represent "higher" priority than larger integer values. Thus, an element with the lowest integer value of priority in the whole collection should always be peeked/dequeued first.

class PQSortedArray {
public:
    PQSortedArray();
    ~PQSortedArray();

    void enqueue(DataPoint element);
    DataPoint dequeue();
    DataPoint peek() const;

    bool isEmpty() const;
    int size() const;
    void clear();

    void printDebugInfo();

private:
    DataPoint* elements;   
    int allocatedCapacity; 
    int numItems;          

    void validateInternalState();
};

Implementing the PQSortedArray class

In addition to the class definition defined above, we have also provided a mostly-complete implementation of the PQSortedArray class in pqsortedarray.cpp. Your task will be to complete the final, currently missing component (function) of the class, which is the function to enqueue new elements into the queue (enqueue). However, before implementing this function, you will need to diligently read through the provided class definition and partial implementation (located in pqsortedarray.h and pqsortedarray.cpp) to build an understanding of the data organization mechanism underlying this version of the priority queue. The rest of this section will walk you through some specifications and examples to aid you in this process.

As mentioned before, the PQSortedArray internally stores a collection of DataPoint structs in an array (represented by the class member variable elements). Importantly, these array elements must be stored in decreasing sorted order of priority value, which means that the elements with the largest priority values exist on the left-hand side of the array and the elements with the smallest priority values exist on the right-hand side of the array. To illustrate this, below is a representation of what the internal elements array would look like if the data points {"Nick", -3}, {"Kylie", 10}, and {"Trip", 106} were enqueued into an empty priority queue.

elements: name: "Trip"
priority: 106
name: "Kylie"
priority: 10
name: "Nick"
priority: -3
? ? ? ? ? ? ?
index:  0   1   2   3   4   5   6   7   8   9 

The benefit of this method of data organization is that it makes the dequeue/peek operations very easy to implement. We know that the element with the smallest priority value will always be at the very end of the array, which means that the two operations can easily know where to look by simply calculating the index of the last currently occupied slot in the array (based on the value of the member variable numItems). You should examine the provided implementations of dequeue and peek to confirm that they operate in exactly this manner.

Your job will be to implement the enqueue function to correctly insert elements into the array to maintain this property. For example, if we started with the above example and then called enqueue({"Julie", 50}), we would expect the internal state of elements to be appropriately updated as such:

elements: name: "Trip"
priority: 106
name: "Julie"
priority: 50
name: "Kylie"
priority: 10
name: "Nick"
priority: -3
? ? ? ? ? ?
index:  0   1   2   3   4   5   6   7   8   9 

Thus, in general, the enqueue operation must find the correct position in the array to place the new element and move the other elements aside to make room for it. Your main task will be to find a generalized, robust way to do so, sharpening your array manipulation and dynamic memory skills along the way.

The other important job of enqueue is to grow the array (by doubling its allocated size) when it is filled to capacity. As seen in Monday's lecture example of the OurVector class, the task of enlarging the array fits nicely in a helper function. You should add a declaration of the helper function to the private section of the class interface and define it as part of the class implementation. You can then call the helper from enqueue when appropriate.

Maintaing the sorted nature of the elements in the array (arranged in decreasing order of priority values) is of the utmost importance throughout this whole exercise. Thus, we have also provided an implementation of a private helper function named validateInternalState, which traverses the elements array and enforces the sorted contraint, raising an error if it encounters any elements whose priorities violate required order. We strongly recommend calling this function at the end of your enqueue function (so much so, that we've even included this line in this lcoation in the starter code). Understanding how to use private helper functions to enforce the internal validity of your data storage mechanism will come in handy when you get to later parts of the assignment.

Notes and Additional Requirements

  • You should not to change the code for any other functions other than enqueue. However, you are allowed to add any additional private helper functions that you might find useful (i.e. to resize the array, insert elements into the correct location, etc.)
  • The OurVector class from lecture and RingBufferQueue from section are good examples to use as a model of a collection class that stores elements using a dynamically-sized array. We highly recommend reviewing these examples before implementing the array resizing component of enqueue.
  • When enlarging the array, you should allocate the new size to be double the previous capacity.
  • It may be helpful to note that the class includes a printDebugInfo function that prints the contents of the internal array for debugging purposes. You can sprinkle in calls to it to as needed to help your debugging. For example, you could consider developing a test case that enqueues one value, followed by a call to printDebugInfo to see how the internal state of the array has been updated. If all goes well, then you can add more calls to the enqueue operation with debug printing in between and observe how things are being handled behind the scenes.

Testing PQSortedArray

With a correct implementation of the missing enqueue function, you should now have a fully working priority queue. We have provided an extensive list of tests to validate the behavior of your newly-completed class. As always, you should also add tests of your own to make sure that the combined set of tests is comprehensive and that they tests all scenarios in which a client might use your class. You should work through these tests and resolve any test failures until all the tests pass and you are confident that your implementation is fully correct.

Timing Analysis

Once you've finalized the functionality of your code, the last step will be to do a timing analysis on the efficiency of your new data structure. First, predict what you expect the Big O of enqueue and dequeue to be (hint: these predictions should probably match up with the stated requirements for the functions that we've provided in pqsortedarray.h). Then, follow the below instructions to do a timing analysis of the two functions.

Ideally, we would we able to directly measure the amount of time that it would take to run a single enqueue/dequeue operation on queues of different sizes, and then look for trends in these times to make a conclusion about the Big-O of each function. However, a single invocation of one of these operations runs so quickly that we don't have access to precise enough timers to note down exactly how long these individual operations take to run. Thus, we must run many, many operations all at once and observe the total time it took all operations to run. Thus, we have provided you with two helpful functions in the starter code named fillQueue and emptyQueue, which run n total enqueue/dequeue operations respectively. Helpful hint: Note that in this case, n represents both the total number of operations to do and the final (worst case) size of the queue – think carefully about how this might impact the timing results you observe.

You should use the SimpleTest timing operations to measure the execution time over 4 or more different size inputs for each of these functions. You should choose sizes so that the largest operation completes in under a minute or so. Note that dequeues generally run much faster than enqueues, so you may have to independently choose different sets of sizes for the enqueue and dequeue operations.

Answer these questions in short_answer.txt:

  • Q5. Give the results from your time trials and explain how they support your Big-O prediction for enqueue and dequeue.
  • Q6: If the PQSortedArray stored the elements in order of lowest-to-highest instead of highest-to-lowest, what impact would this have on the Big-O runtimes of enqueue and dequeue?

As a side note, observe that both the fillQueue and emptyQueue functions take their respective PQSortedArray instances by reference – this is because we have disabled the copy constructor for the PQSortedArray class, which prohibits any operations that might try to make a copy of the data structure (like passing by value or returning it from a function). Take CS106L to learn more about copy constructors!