Assignment 7: Heap allocator

Due: Wed Jun 6 11:59 pm
Ontime bonus 5%. Grace period for late submissions until Thu Jun 7 11:59 pm

Assignment by Julie Zelenski
based on idea from Randy Bryant & David O'Hallaron (CMU)

Learning goals

In completing this assignment, you will:

  • appreciate the complexity and tradeoffs in implementing a heap allocator
  • further level up your pointer-wrangling and debugging skills
  • profile your code to find inefficiencies and work to streamline them
  • bring together all of your CS107 skills and knowledge into a satisfying capstone experience-- you can do it!

Overview

You've been using the heap all quarter, now is your chance to go inside and build your own version of malloc, realloc, and free! The key goals for a heap allocator are:

  • correctness: services any combination of well-formed requests
  • tight utilization: compact use of memory, low overhead, recycles
  • fast: can service requests quickly

There are a wide variety of designs that can meet these goals, with various tradeoffs to consider. Correctness is non-negotiable, but is the next priority on conserving space or improving speed? A bump allocator can be crazy-fast but chews through memory with no remorse. Alternatively, an allocator might pursue aggressive recycling and tight packing to squeeze into a small memory footprint but execute a lot of instructions to achieve it. A industrial-strength production allocator aims to find a sweet spot without sacrificing one goal for the other. This assignment is an exploration of implementing a heap allocator and balancing those tradeoffs.

Getting started

Check out a copy of your cs107 repository with the command:

    git clone /afs/ir/class/cs107/repos/assign7/$USER assign7

The starter project contains various C files and supporting Makefile and custom_tests files. In the samples subdirectory, you'll find a collection of allocator scripts for testing.

Code study: test harness

In order to put a heap allocator through its paces, you will need a good testing regimen. Our provided test_harness.c is a big help! It allows you to test an allocator on different workloads supplied as allocator scripts.

An allocator script is a file that contains a sequence of requests in a simple text-based format. The three request types are a (allocate) r(reallocate) and f (free). Each request has an id-number that can be referred to in a subsequent realloc or free.

a id-number size
r id-number size
f id-number

A script file containing:

a 0 24
a 1 100
f 0
r 1 300
f 1

is converted into these calls to your allocator:

void *ptr0 = mymalloc(24);
void *ptr1 = mymalloc(100);
myfree(ptr0);
ptr1 = myrealloc(ptr1, 300);
myfree(ptr1);

The harness will parse a script file and translate it into requests. While executing the script, it attempts to verify that each request was correctly satisfied.

Read over test_harness.c so that you understand how it it operates. You don't need to dig into the parts that parse a script file, but do carefully review the code that executes the script and checks correctness. In order to use the test harness to its fullest, you need to understand its features. Verify your understanding by checking in with yourself on the following questions:

  • When making a malloc request, what checks does the test harness perform to verify the request was properly serviced? What checks does it use on free? On realloc?
  • How does the test harness verify that no memory blocks overlap one another?
  • What value does the test harness write to the payload bytes of an allocated block? For what purpose is it writing to the payload?
  • How does the test harness calculate the allocator's average utilization?
  • In addition to its own checks to verify constraints that can be externally validated, the test harness also make calls to validate_heap. validate_heap is a function that you will write that does a scan of your internal heap data structures. At what point does the test harness call validate_heap? How does the test harness respond when validate_heap returns false? What command-line flag configures the test harness to skip the calls to validate_heap?
  • The test harness calls the allocator's myinit function before executing each script. The function is expected to wipe the slate clean and start fresh. If myinit doesn't do its job, you may encounter bugs caused by ghost values leftover from the previous script. Keep this in mind for when implementing myinit in your heap allocator!
  • The arguments to myinit are the bounds of the heap segment. Run test_bump under gdb and print out those boundaries. At what address does the heap segment start? What is the heap segment size? Remember these values, you are going to be seeing a lot of activity in this region.

If there is anything you don't understand about the test harness, please ask! You want to have a good understanding of how this code operates so you can use it effectively when testing your allocator.

Code study: bump allocator

The bump.c file contains an implementation of a bump allocator. We discussed this strategy in lecture as one of the simplest possible approaches.

The bump allocator is wanton in terms of utilization, but quite fast (unsurprising given how little code it requires). Here are a few experiments to try out on to observe how it performs:

  • Read over the C code for the bump allocator. Disassemble its mymalloc and myfree functions and count the number of assembly instructions. This gives you the static instruction count.
  • Open the allocator script samples/pattern-repeat.script and peruse its contents. The script makes 1000 requests: 500 repeated pairs of a call to mymalloc followed by a call to myfree.
  • A profiler is a tool that counts instructions executed at runtime, this is the dynamic instruction count. Read our guide on using the Valgrind callgrind profiling tool.
  • Use the callgrind profiler to get the dynamic instruction count of the bump allocator when servicing this script:

    valgrind --tool=callgrind --toggle-collect='my*' ./test_bump samples/pattern-repeat.script
    

    What is the relationship between the static and dynamic instruction counts? Do you see how they match up?

  • Running callgrind writes a callgrind output file named callgrind.out.pid (where pid is a process number). Run the annotator on this output file to see the C source labeled with the instruction counts.

    callgrind_annotate --auto=yes callgrind.out.pid
    

    No part of the bump allocator code will stand out as a particular "hot spot". The operations are all pretty streamlined.

  • Review the C code for myrealloc and its generated assembly. The allocator script samples/pattern-realloc.script makes 1000 requests: 100 calls to mymalloc calls and 900 calls to myrealloc. Use the same profile/annotate steps on this script file. This annotation will report a definitive "hot spot" taking a large number of instructions. What is that time-consuming operation? What makes it so expensive?

  • The bump allocator has no per-block overhead, but its inability to recycle memory means its utilization is expected to be abysmal. What does the test harness report as the calculated utilization when servicing these allocator scripts?

The bump allocator can service a malloc/free in just a handful of instructions, but its utilization is the pits. In your own allocator, you will aim to strike a balance of good results on both utilization and speed rather than sacrificing one for the other.

1) Implement implicit free list

Now it's your turn! The file implicit.c is ready and waiting for you to implement your own recycling allocator that puts the naive bump allocator's memory-chomping approach to shame.

The specific features that your implicit free list allocator must support:

  • Headers track block information (size, status in-use or free)
  • Free blocks are recycled and reused for subsequent malloc requests if possible
  • Malloc searches heap for free blocks via an implicit list (i.e. traverses block-by-block)
  • The validate_heap function verifies the heap data structures are intact (i.e. all of heap segment is accounted for, headers are well-formed and uncorrupted)

This allocator won't be that zippy of a performer (an implicit free list is poky at best) but recycling freed nodes should improve utilization over bump . The implicit version doesn't take advantage of opportunities to more aggressively recycle memory (such as coalescing freed blocks and in-place realloc); those features are coming up next in the explicit version.

The bulleted list above indicates the minimum specification that you are required to implement. Further details are intentionally unspecified as we leave these design decisions up to you; this means it is your choice how to structure your header, whether you search using first-fit/next-fit/best-fit/worst-fit, and so on. It can be fun to play around with these decisions to see the various effects. Any reasonable and justified choices will be acceptable for grading purposes.

2) Implement explicit free list

Building on what you learned from writing the implicit version, use the file explicit.c to develop a high-performance allocator that adds an explicit free list and includes support for coalescing and in-place realloc.

The specific features that your explicit free list allocator must support:

  • Block headers and recycle freed nodes (can copy from your implicit version)
  • Explicit free list managed as a doubly-linked list, uses first 16 bytes of payload of freed block for next/prev pointers
    • Note the header should not be enlarged to add fields for the pointers. Since the list only consists of free blocks, the economical approach is to store those pointers in the otherwise unused payload.
  • Malloc searches explicit list of free blocks
  • A freed block is coalesced with neighbor blocks to the right that are also free. Coalescing a pair of blocks must operate in O(1) time. You do not have to support coalesce to the left.
  • Realloc resizes block in-place whenever possible, e.g. if resize smaller or neighboring block(s) to right are free and can be absorbed.
  • The validate_heap function verifies the heap data structures are intact (implicit and explicit structures)

The bullet points represent our requirements, all further decisions not specifically mandated are yours to determine. The required features result in further small gains in utilization and a big boost in speed over the implicit version. You will end with a pretty snappy allocator that is also strong on utilization -- a more impressive accomplishment!

If you have a working explicit version that meets the requirements and are looking for an extra challenge, focus on optimizing to further streamline the performance and raise the utilization. Use callgrind to find the hot spots and work to minimize them. Get help from gcc by editing the CFLAGS in your Makefile and then add your own efforts to optimize what gcc cannot. How much improvement do your efforts make? Use your readme to tell us about your adventures in optimization -- we'd love to hear about it!

Readme

In your readme, we ask that you write a paragraph about each of your two allocators (implicit and explicit) . Describe the design decisions you made for the allocator and why you made those choices. Summarize its overall performance characteristics of the allocator and identify its strengths and weaknesses. Share an amusing anecdote or interesting insight that arose while designing, developing, testing, debugging, or optimizing that allocator.

As you finish your readme, we have one final question to ask you:

What would you like to tell us about your quarter in CS107? Of what are you particularly proud: the effort you put into the quarter? the results you obtained? the process you followed? what you learned along the way? Tell us about it! We are not scoring this question, but wanted to give you a chance to share your closing thoughts before you move on to the next big thing.

Common allocator requirements

Requirements that apply to both allocators:

  • The interface should match the standard libc allocator. Carefully read the malloc man page for what constitutes a well-formed request and the required handling to which your allocator must conform. Note there are a few oddballs (malloc(0) and the like) to take into account. Ignore esoteric details from the NOTES section of the man page.
  • There is no requirement on how to handle improper requests. If a client reallocs a stack address, frees an already freed pointer, overruns past the end of an allocated block, or other such incorrect usage, your response can be anything, including crashing or corrupting the heap. We will not test on these cases.
  • An allocated block must be at least as large as requested, but it is not required to be exactly that size. The maximum size request your design must accommodate is 1 << 30 bytes. If asked for a block larger than the max size, your allocate can return NULL, just as it would for any request that cannot be serviced. Every allocated block must be aligned to an address that is a multiple of the ALIGNMENT constant (8). It's your choice whether small requests are rounded up to some minimum size.
  • Your allocator cannot invoke any memory-management functions. By "memory-management" we specifically mean those operations that allocate or deallocate memory, so no calls to malloc, realloc, free, calloc, sbrk, brk, mmap, or related variants. The use of other library functions is fine, e.g. memmove and memset can be used as they do not allocate or deallocate memory.
  • Your allocator may use a small amount of static global data, limited to at most 500 bytes. This restriction dictates the bulk of the heap housekeeping must be stored within the heap segment itself.
  • Your validate_heap function will be called from our test harness when checking correctness to verify the internal consistency of the heap data structures after servicing each request. This is here to help you with debugging. The validity check will not be made during any performance testing, so there is no concern about its efficiency (or lack thereof).

Advice/FAQ

Don't miss out on the companion advice page:
Go to advice/FAQ page

Grading

Here is the tentative point breakdown:

Functionality

  • Sanity/sample scripts (35 points) Correct results on the default sanity check tests and published sample scripts.
  • Comprehensive/performance (55 points) Fulfillment of the specific implicit/explicit requirements and correct servicing of additional sequences of mixed requests. The performance expectations are fairly modest and intended to verify that the allocator correctly implements the required features.
  • Robustness (20 points) Handling of required unusual/edge conditions (request too large, malloc 0 bytes, etc.)
  • Clean compile (2 points) We expect your code to compile cleanly without warnings.

Readme questions

  • readme.txt. (10 points) The readme questions will be graded on the thoughtfulness and completeness of your answers.

Code quality (buckets weighted to contribute ~20 points)

  • We expect your allocator code to be clean and readable. We will look for descriptive names, helpful comments, and consistent layout. We expect your code to show thoughtful design and appropriate decomposition. Control flow should be clear and direct. We expect common code to be factored out and unified. Complex tasks or tricky expressions that are repeated should be decomposed into shared helpers.
  • The code review will also assess the thoroughness and quality of your validate_heap routines.

On-time bonus (+5%)

The on-time bonus for this assignment is 5%. Submissions received by the due date earn the on-time bonus. The bonus is calculated as a percentage of the point score earned by the submission.

Special attention to the Honor Code

An allocator submission is expected to be your original work. Your code should be designed, written, and debugged by you independently. The B&O textbook contains background reading on heap allocators, including sample code. You are allowed to review the code in the textbook as a special case, but must not otherwise study other heap allocator code nor incorporate code from outside sources into your submission nor discuss code-level specifics with others. If you accidentally stumble across allocator code written by someone else, we expect you to immediately about-face from it and return to designing and writing your own allocator.

A student who is retaking the course may not re-use the design, code, or documentation from any previous effort from a previous quarter. The best approach to avoiding problems is to discard all work from the past, as it can be a temptation that you're better off without.

Finish and submit

When finished, be sure to submit your files for grading.

The on-time deadline is Wed 11:59pm. There is an abbreviated grace period of one day, allowing late submissions until Thurs 11:59pm. Once the grace period expires, no further late submissions will be accepted. Cutting it too close can land you on the wrong side of the boundary with no recourse other than to discard all your hard work into the rubbish bin. Do not let this happen to you -- submit early, submit often, so you'll have an insurance policy against any unexpected last-minute mishap!

Check in with post-task self-check and it's definitely time to celebrate! Congratulations on an awesome quarter!

Here I am at the end of the road and at the top of the heap. — Pope John XXIII