Go back to this week's lab writeup here.
This optional problem is an opportunity to further exercise your understanding of optimization.
1) Stack vs. Heap Allocation (20 minutes)
Stack allocation is preferable to heap allocation for a multitude of reasons (type-safety, automatic allocation/deallocation, etc.), while also offering significant performance advantages. Let's use callgrind to profile a sorting program to explore the relative runtime cost of using the stack versus the heap.
The isort
program implements insertion sort. When sorting the array, it will need temporary space when swapping array elements and the program explores the performance impact of allocating that temporary space using a fixed-size stack buffer, a variable-length stack buffer, or dynamically allocating from the heap.
-
Review the C code in
isort.c
, talk it over with your group and make some predictions about the relative performance of the different functions. -
Run under callgrind and annotate the output:
valgrind --tool=callgrind --toggle-collect='swap_*' ./isort callgrind_annotate --auto=yes callgrind.out.pid
-
All three swap functions have the same instruction count in the three memcpy calls; the issue to examine is the instructions spent in setting up the tmp memory and cleaning it up.
- Find the instruction counts for the
swap_fixedstack
andswap_varstack
functions. The lines corresponding to the function opening/closing brace and variable declaration are annotated with counts for function entry/exit. Q: What is the difference in the number of instructions executed on entry and exit toswap_fixedstack
vsswap_varstack
? - Compare the disassembly for these two functions. Q: How many instructions are required to set up and clean up a constant-length stack array? How many additional instructions are needed for a variable-length stack array? Does the count of additional instructions in the disassembly match the count of executed instructions reported by callgrind?
- Now look at the instruction counts for
swap_heap
. The annotation will indicate a small-ish number for the instructions that set up for the call to malloc/free and a much larger number of instructions executed within the malloc and free library functions. Q: About how many instructions does each call to malloc incur? Each call to free? How does that compare to the cost of allocating and deallocating space on the stack?