Assign6: Linked list sort

Linked list sorting

One of the advantages of a linked list over an array is its flexibility and efficiency when rearranging elements. Unlike arrays, which occupy contiguous memory locations and require shuffling (remember how arduous it was to shift all the elements over by one when implementing PQSortedArray?), reordering an element in a linked list is simply a matter of rewiring a few pointers. A task like sorting, which rearranges many elements in the collection, can have an distinct advantage when operating on a linked list instead of an array.

For this task, you are to write the function

void sort(ListNode*& list)

which rearranges the nodes of list into sorted order. You have the choice of implementing either the Mergesort or the Quicksort algorithm to do the sort.

In addition to implementing the sort function, you will write any helper functions you need along with test code that helps you test your work.

Sorting algorithms

Rather than repeat the background information on sorting algorithms, please review the May 15 lecture for the explanation and diagrams of the sorting algorithms. The rest of this handout assumes familiarity with that content.

Both Mergesort and Quicksort are examples of recursive "divide and conquer" algorithms. Each divides the input list into two sublists, recursively sorts the sublists, and then joins the sorted sublists into one combined whole. The two algorithms differ in what work is required to divide into sublists and what needs to happen in the join step. Mergesort is referred to as "easy-split-hard-join" because it has a simple division step and a complex join. Quicksort is "hard-split-easy-join" because the bulk of its work goes into the complex division step with a trivial join to follow.

Mergesort

The classic Mergesort algorithm divides the input list into two sublists, recursively sorts the two lists, and applies a binary merge to join the two sorted lists into one combined whole.

Linked lists are a natural for this sort. Each level of the recursion moves elements to one of two sides and later rejoins the sides, the flexible wiring of the list supports this with ease. Also note that the existing nodes can re-distributed into sublists. This is another advantage over the array/vector version which copies the sublists over and over again.

In the split step, rather than separating into front half and back half, we want you to do an even-odd distribution, where every other element goes to one side. The 1st, 3rd, 5th, … elements are placed in one sublist, the 2nd, 4th, and so on are placed in the other. The order that the elements are placed into the sublist is of no consequences, you may do whatever is most convenient for your algorithm.

The merge operation must be implemented iteratively. Although it is possible and even quite elegant to solve recursively, the cost of one stack frame per element being processed is much too high to bear and the function would be unable to handle larger lists. For similar reasons, the divide step must also be implemented iteratively.

Quicksort

The classic Quicksort algorithm divides the input list into sublists that are less than or greater than a chosen pivot. After recursively sorting the sublists, they are joined in sequence.

Linked lists also work well for Quicksort. Distributing the elements into sublists is a more straightforward task that the contortions required to rearrange those same elements within an array.

The partition step is where a lot of the magic happens. Choose the first element of the list as the pivot. Divide the elements into three sublists: those elements that are strictly less than the pivot, those that are equal to the pivot, and those that are strictly greater. Note that the existing nodes are re-distributed into the sublists, there is no need to allocate new nodes or make copies. The order that the elements are placed into the sublist is of no consequences, you may do whatever is most convenient for your algorithm.

The less and greater sublists are recursively sorted. The sublist of elements equal to the pivot is already trivially sorted.

The join step attaches the three sorted sublists in one combined list by adding the missing links need to concatenate them.

As noted above, an operation such as partition that examines every element, must be implemented iteratively, not recursively. This is necessary to ensure you can process larger lists without exhausting the call stack.

Notes for either sort

Your sort must operate solely by re-wiring the links between existing ListNodes. Do not allocate nor deallocate ListNodes. Do not replace/swap the ListNode data values.
Do not use a sort function or library to arrange the elements, nor any algorithms from the STL C++ framework.
Do not create any temporary or auxiliary data structures (array, Vector, Map, etc.).
We highly recommend that you decompose out two helper functions, one for the split portion of the algorithm and another for the join portion of the algorithm.
Mergesort and Quicksort operate recursively in that they divide the input list into sublists; the sublists are recursively sorted and then joined. The split and join steps do not need to be recursive and if they are recursively implemented, your sort would be incapable of sorting longer lists because these operations would exceed the limits on the size of call stack. Your sort should be capable of sorting lists of effectively any size. For this reason, you must implement your helper functions using iteration.

Answer the following questions in short_answer.txt:

Q5: The prototype for the sort function takes a ListNode* by reference. Explain why the pointer itself need to be passed by reference and what would be the consequence if it were not.

Testing

For this assignment, you will take on much greater responsibility for writing your own test cases. The only provided test case is a sample time operation to get you started.

Although the overall goal is that your sort function work according to its top-level specification, e.g. confirming the list is rearranged into sorted order, test cases that operate at top-level will not helpful early in your development.

Instead, you should start by constructing student tests of your own that can be used to test your helper functions. For example, after you have written your split or partition helper, add test cases that specifically exercise the helper to verify its correctness. Similarly the merge or join helper can be tested on its own. Once you have confirmed the correctness of your helpers, you can move on to testing their use in the context of the sort operation.

Another recommendation is that you consider writing test helper functions that translate from Vector to linked list or vice versa, or perhaps compare a linked list to a Vector. This allows you to construct test cases in convenient form as Vector and translate/compare to linked list as needed. If you are looking for guidance on how to do so, we highly recommend checking out the testing code used for this week's section problems. If you do incorporate this code into your submission, be sure to properly cite.

Some test cases to be sure that you include in your test suite would be:

Sorting an empty list
Sorting a single-element list
Sorting a list in already sorted or reverse sorted order
Sorting a list that contain some duplicate elements or all duplicate elements
Sorting a list that contains positive and negative numbers
Sorting a list of random values, randomly organized
Sorting a very long list

You should also take care to avoid memory leaks in your tests by properly deallocating all ListNodes used temporarily during a test. Although your sort function does not allocate nor deallocate nodes, your test cases may need to allocate and deallocate memory for testing purposes. Any memory that is allocated for the test case should be deallocated at the end of the test case.

An important component of the grading for this assignment will be the quality of your test cases. Show us that you have learned from the example we have set in earlier assignments with our provided tests!

Timing your sort

Both Mergesort and Quicksort should run in O(NlogN) time in the common case. Our Vector class uses the C++ std::sort to sort its internal array which is also an O(NlogN) algorithm. You can prepare the same input sequence once in array form and again as a linked list, and then time the operation of Vector sort versus your linked list sort on that sequence to compare the relative performance of sorting linked lists versus array. What do you predict will be the outcome of the great sorting race? Try it and find out!

Answer the following questions in short_answer.txt:

Q6. Run timing trials and provide your results that confirm that your algorithm runs in O(NlogN) .
Q7. Run timing trials compare your linked list sort to Vector sort on the same sequence and report the results. Who wins and why?

Advice

Mergesort has some commonalities with the multiMerge task from Assignment 4. If you chose to do that task, familiarity could be a reason to choose Mergesort this time. Alternatively, you may want to go with Quicksort for the novelty of learning something new. It is entirely your choice.
Both algorithms are tricky and require very careful management of linked list wiring, but we would rate Quicksort as a bit more of a challenge, so that may be a factor to consider in whether you want to step up (or stay away!).
Be sure you have completed both warmup exercises before starting on this task. If you ran into any trouble on the warmup, get those confusions resolved first. You want to have top-notch skills for using the debugger to examine linked structures and be well-versed in what to expect from memory errors and how to diagnose them.
Correct use of memory and pointers requires much careful attention to detail. You'll likely benefit from drawing a lot of diagrams to help you keep track.
To be sure your code is doing what you intend, you'll likely need to step in the debugger and examine the state of your lists as you go. In fact, you might find it most convenient to run in the debugger full time as you work on this project, as this gives you the best tools for analyzing what is going on in your program at any given moment. This is information you cannot get any other way.
Our solution for each sort is less than a page of code, but this is a page of code you will work hard for. It is dense stuff! Start by fully thinking through your strategy and ensure you thoroughly understand the algorithm you implementing. Draw out an intentional design from the start and follow a systematic development process and you can absolutely triumph. But hacking up an attempt operating with incomplete understanding and hoping you can prod the code into working is likely to lead to headache and heartache.

Extensions

Here are some ideas for further exploration of list sorting:

Add a smarter choice of pivot for Quicksort. A poor pivot (e.g. min or max of sequence) can significantly degrade performance. Confirm this by constructing an input to Quicksort that triggers this worst cast and run time trials to observe it degenerating to O(N^2). There are many neat appraoches to combat this effect, see https://stackoverflow.com/questions/164163/quicksort-choosing-the-pivot/ for starters.
Research and implement an alternative sorting algorithm. There are lots of interesting options! To name a few:
- Selection Sort
- Insertion Sort
- Bubble Sort (but Obama recommends against)
- Brick Sort
- Comb Sort
- Shaker Sort
- Gnome Sort
- Block Sort
- Shell Sort
- Radix Sort
- Spreadsort
- Natural Merge Sort
- Patience Sort
Some of these are more suited to linked lists than others, so do some background research on the algorithm before deciding whether it would be a good one to tackle.