Assign5: Sorting Linked Lists

The power of lists

One of the advantages of a linked list over an array is its flexibility and efficiency when rearranging elements. Unlike arrays, which occupy contiguous memory locations and require shuffling (remember how arduous it was to shift all the elements over by one when implementing PQSortedArray?), reordering an element in a linked list is simply a matter of rewiring a few pointers. A task like sorting, which rearranges many elements in the collection, can have a distinct advantage when operating on a linked list instead of an array-backed data structure. For this portion of the assignment, you are to write the functions:

void mergeSort(ListNode*& list)
void quickSort(ListNode*& list)

which each rearrange the nodes of list into sorted order. Over the course of the assignment, you will implement both the Mergesort and Quicksort sorting algorithms, which were covered in Wednesday's lecture. If you haven't yet watched that lecture, we recommend starting there first.

This part of the assignment may seem daunting, but by breaking it down into small, testable steps, you’ll find the tasks much more manageable. Start by fully thinking through your strategy and ensure you thoroughly understand the algorithm you are implementing. We’ve outlined smaller steps below to help break down the coding down into more specific milestones:

Implement the three suggested helper functions for writing tests: deallocateList, createList, and checkListVectorEquality.
Work on mergeSort:
- First write and thoroughly test the split helper function.
- Then write and thoroughly test the merge helper function.
- Put everything together in mergeSort, and write a comprehensive set of test cases for your overall function.
Work on quickSort:
- First write and thoroughly test the partition helper function.
- Then write and thoroughly test the concatenate helper function.
- Put everything together in quickSort, and write a comprehensive set of test cases for your overall function.
Write timing tests for runtime analysis.

In previous assignments, we have provided all or most of the testing infrastructure that you needed in order to validate the efficacy of the functions that you were asked to implement. You developed your testing skills throughout those assignments by supplementing our provided tests with tests of your own to build a comprehensive, thorough suite of tests. On this assignment, you will also be responsible for developing most of the testing infrastructure necessary in order to test the functionality of both sorting algorithms. In the next section, we're going to walk you through some guidance on how to go about writing comprehensive tests and the helper functions that will help you implement those tests.

Testing tips

Utility functions

Before you can start writing test cases, we strongly recommend implementing some linked list utility functions that will make testing and debugging your code a much smoother process. To that end, we have provided prototypes for four recommended helper functions in the starter code. Descriptions of these helper functions are provided below:

void printList(ListNode* front) This is a helpful function to have for debugging purposes. Given a pointer to the front of a linked list, this function prints out the data contents of the linked list in a nicely formatted manner. This function is provided for you fully-implemented and working.
void deallocateList(ListNode* front) This function is given a pointer to the front of a linked list and is responsible for freeing all the memory associated with the linked list. This function will be helpful to avoid having memory leaks in the tests that you write.
ListNode* createList(Vector<int> values) This function is given a list of integer values and builds a linked list that contains exactly those values in the provided order. The return value of this function is a pointer to the beginning of the newly-created linked list. In the entire sorting portion of this assignment, this should be the only function in which you need to use the new keyword.
bool checkListVectorEquality(ListNode* front, Vector<int> v) Given a linked list and a vector, this function compares the two data structures to check to see if their contents are exactly equivalent. If the two structures contain exactly the same values in exactly the same order, this function should return true – otherwise, it returns false.

The last three functions listed above have been defined but not implemented in the starter code. Although not a strict requirement, we strongly recommend implementing all of these functions before trying to write any additional student tests. We have provided you with one test in the starter code that tests the functionality of these helper functions. Feel free to modify or expand that test as you see fit.

Note: The code that you need to write for these three functions may look very similar to code that we've seen before in lecture and section. While you are more than welcome to pull from these sources of code when implementing these functions, you should cite your sources appropriately.

An incremental and iterative testing process

Although the overall goal is that your sort functions work according to their top-level specifications (i.e. confirming a list is rearranged into sorted order), test cases that operate at this top-level will not be helpful early in your development.

Instead, you should start by constructing student tests of your own that can be used to test your helper functions. For example, after you have written your split/partition helpers (defined in the next section), add test cases that specifically exercise the helper to verify its correctness. Similarly, the merge/concatenate helper functions (also defined in the next section) can be tested on their own. Once you have confirmed the correctness of your helpers, you can move on to testing their use in the context of the sort operation.

End-to-end tests

The final step in developing a testing strategy for this assignment will be figuring out how to put all these helper functions together to structure an end-to-end test. To help you out with that, we have provided you with the "skeleton" of a STUDENT_TEST that shows an example of how to put together all of the helper functions into a useful test. We strongly recommend using this provided test as a template upon which to base all of your student tests for this portion of the assignment.

Once you get to writing end-to-end tests for your overall sorting functions, we recommend including tests that address the following situations in your test suite:

Sorting an empty list
Sorting a single-element list
Sorting a list in already sorted or reverse sorted order
Sorting a list that contain some duplicate elements or all duplicate elements
Sorting a list that contains positive and negative numbers
Sorting a list of random values, randomly organized
Sorting a very long list

You should also take care to avoid memory leaks in your tests by properly deallocating all ListNodes used temporarily during a test. Although your each sorting function should not allocate or deallocate nodes, your test cases may need to allocate and deallocate memory for testing purposes. Any memory that is allocated for the test case should be deallocated at the end of the test case.

An important component of the grading for this assignment will be the quality of your test cases. We want you to take this opportunity to show us that you have learned from all of the different testing exercises and provided tests that you've seen on the first four assignments!

Exploring different sorting algorithms

Rather than repeat the background information on sorting algorithms, please review the July 29 lecture for the explanation and diagrams of the sorting algorithms. The rest of this handout assumes familiarity with that content.

Both Mergesort and Quicksort are examples of recursive "divide and conquer" algorithms. Each divides the input list into two sublists, recursively sorts the sublists, and then joins the sorted sublists into one combined whole. The two algorithms differ in what work is required to divide into sublists and what needs to happen in the join step. Mergesort is referred to as "easy-divide-hard-join" because it has a simple division step and a complex join. Quicksort is "hard-divide-easy-join" because the bulk of its work goes into the complex division step with a trivial join to follow.

Mergesort

The classic Mergesort algorithm divides the input list into two sublists, recursively sorts the two lists, and applies a binary merge to join the two sorted lists into one combined whole.

Linked lists are a natural for this sort. Each level of the recursion moves elements to one of two sides and later rejoins the sides, the flexible wiring of the list supports this with ease. Also note that the existing nodes can be re-distributed into sublists. This is another advantage over the array/vector version which copies the sublists over and over again.

In the split step, rather than separating into a front half and a back half, we want you to do an even-odd distribution, where every other element goes to one side. The 1st, 3rd, 5th, … elements are placed in one sublist, the 2nd, 4th, and so on are placed in the other. This split method not only gives you practice with linked list rewiring, but it also will not negatively affect our runtime (we'll leave that as an exercise for you to convince yourself!). Note that due to how merge sort works, the order that the elements are placed within the sublists is of no consequence; you may do whatever is most convenient for your algorithm.

Both the split and merge helper functions must be implemented iteratively. Although it is possible and even quite elegant to write them recursively, the cost of one stack frame per element being processed is much too high to bear and the function would be unable to handle larger lists. However, this does not mean that the overall mergeSort function should be iterative. In fact, the sort function itself will still need to be recursive!

Quicksort

The classic Quicksort algorithm divides the input list into sublists that are less than or greater than a chosen pivot. After recursively sorting the sublists, they are joined in sequence.

Linked lists also work well for Quicksort. Distributing the elements into sublists is a more straightforward task than the contortions required to rearrange those same elements within an array.

The partition step is where a lot of the magic happens. Choose the first element of the list as the pivot. Divide the elements into three sublists: those elements that are strictly less than the pivot, those that are equal to the pivot, and those that are strictly greater. Note that the existing nodes are re-distributed into the sublists, there is no need to allocate new nodes or make copies. The order that the elements are placed into the sublist is of no consequences, you may do whatever is most convenient for your algorithm.

The less and greater sublists are recursively sorted. The sublist of elements equal to the pivot is already trivially sorted.

The concatenate step attaches the three sorted sublists in one combined list by adding the missing links needed to concatenate them.

Similar to the reasons that mergeSort's helper functions needed to be iterative, for the partition and concatenate helpers must also be implemented iteratively, not recursively, because they examine every element in a given list. This is necessary to ensure that you can process larger lists without exhausting the call stack.

Requirements and suggestions for both algorithms

Your sort must operate solely by rewiring the links between existing ListNodes. Do not allocate or deallocate ListNodes. Do not replace/swap the ListNode data values.
Do not use a sort function or library to arrange the elements, nor any algorithms from the STL C++ framework.
Do not create any temporary or auxiliary data structures (array, Vector, Map, etc.).
We highly recommend that you decompose out two helper functions per sorting algorithm: split and merge for mergeSort and partition and concatenate for quickSort.
Mergesort and Quicksort operate recursively in that they divide the input list into sublists; the sublists are recursively sorted and then joined. However, their helper function steps do not need to be recursive, and if they are recursively implemented, your sort would be incapable of sorting longer lists because these operations would exceed the limits on the size of the call stack. Your sort should be capable of sorting lists of effectively any size. For this reason, you must implement your helper functions using iteration.

Answer the following questions in short_answer.txt:

Q6: The prototypes for both sort functions take a ListNode* by reference. Explain why the pointer itself needs to be passed by reference and what the consequences would be if it were not.

Analyzing runtime

Both Mergesort and Quicksort should run in O(NlogN) time in the common case. Our Vector class uses the C++ std::sort to sort its internal array which is also an O(NlogN) algorithm (backed by quicksort). You can prepare the same input sequence once in array form and again as a linked list, and then time the operation of Vector sort versus your linked list sort on that sequence to compare the relative performance of sorting linked lists versus array. What do you predict will be the outcome of the great sorting race? Try it and find out!

Answer the following questions in short_answer.txt:

Q7. Run timing trials and provide your results that confirm that your algorithm runs in O(NlogN) .
Q8. Run the provided timing trials that compare your linked list quicksort to a Vector sort on the same sequence and report the results. Who wins and why?

Advice

Be sure you have completed both warmup exercises before starting on this task. If you ran into any trouble on the warmup, get those confusions resolved first. You want to have top-notch skills for using the debugger to examine linked structures and be well-versed in what to expect from memory errors and how to diagnose them.
Correct use of memory and pointers requires much careful attention to detail. You'll likely benefit from drawing a lot of diagrams to help you keep track.
To be sure your code is doing what you intend, you'll likely need to step in the debugger and examine the state of your lists as you go. In fact, you might find it most convenient to run in the debugger full time as you work on this project, as this gives you the best tools for analyzing what is going on in your program at any given moment. This is information you cannot get any other way.
Draw out an intentional design from the start and follow a systematic development process, and you can absolutely triumph. But hacking up an attempt operating with incomplete understanding and hoping you can prod the code into working is likely to lead to headache and heartache.

Extensions

Here are some ideas for further exploration of list sorting:

Add a smarter choice of pivot for Quicksort. A poor pivot (e.g. min or max of sequence) can significantly degrade performance. Confirm this by constructing an input to Quicksort that triggers this worst cast and run time trials to observe it degenerating to O(N^2). There are many neat appraoches to combat this effect, see https://stackoverflow.com/questions/164163/quicksort-choosing-the-pivot/ for starters.
Research and implement an alternative sorting algorithm. There are lots of interesting options! To name a few:
- Selection Sort
- Insertion Sort
- Bubble Sort (but Obama recommends against)
- Brick Sort
- Comb Sort
- Shaker Sort
- Gnome Sort
- Block Sort
- Shell Sort
- Radix Sort
- Spreadsort
- Natural Merge Sort
- Patience Sort
Some of these are more suited to linked lists than others, so do some background research on the algorithm before deciding whether it would be a good one to tackle.