Due: Fri Dec 6 11:59 pm
No late submissions accepted.
Assignment by Julie Zelenski, with modifications by Nick Troccoli
based on an idea from Randy Bryant & David O'Hallaron (CMU)
Learning Goals
This assignment gives you a chance to implement a core piece of functionality that you've relied on all quarter - a heap allocator! This assignment will help you:
- appreciate the complexity and tradeoffs in implementing a heap allocator
- further develop your pointer and debugging skills
- profile your code to find inefficiencies and work to streamline them
- bring together all of your CS107 skills and knowledge
Overview
You've been using the heap all quarter, and now you get to go under the hood and build your own version of malloc, realloc, and free! The key goals for a heap allocator are:
- correctness: services any combination of well-formed requests
- tight utilization: compact use of memory, low overhead, recycling memory
- fast: services requests quickly
There are a wide variety of designs that can meet these goals, with various tradeoffs to consider. Correctness is of course essential, but is the next priority to conserve space or improve speed? A bump allocator can be crazy-fast but chews through memory with no remorse. Alternatively, an allocator might pursue aggressive recycling and packing to squeeze into a small memory footprint, but execute a lot of instructions to achieve it. An industrial-strength allocator aims to find a sweet spot without sacrificing one goal for the other.
In this assignment, you will implement two different heap allocator designs; an implicit free list allocator and an explicit free list allocator. This assignment leaves room for you to experiment and decide between different possible approaches for implementing these allocators to balance various tradeoffs - beyond the requirements listed in this spec, you are free to design your allocators in the best way you see fit!
To get started on this assignment, clone the starter project using the command
git clone /afs/ir/class/cs107/repos/assign7/$USER assign7
The starter project contains the following:
test_harness.c: a provided testing program that lets your run your heap allocators on different test cases and catch potential errors. You do not need to modify this file, but will read over part of this code as part of a code reading exercise.bump.c: a provided implementation of a bump allocator. This implementation is not robust, but serves as an example of how to implement a heap allocator. You do not need to modify this file, but will read over this code as part of a code reading exercise.implicit.c,explicit.candMakefile: files where you will add your implementations for your implicit and explicit free list allocators, respectively, and the Makefile to compile them.allocator.h: a file containing function prototypes for functionality in each of your allocators, and some helpful constants. You do not need to modify this file, but can use these constants automatically in your code.segment.h, andsegment.c: supporting files needed for programs to use your heap allocator, that provide the heap segment size and heap segment start. You do not need to modify these files.debug_break.h: a provided file with a helpful function for debugging that allows you to call a function to simulate hitting a GDB breakpoint where that function is called. You do not need to modify this file.custom_tests: a file where you may, but are not required to, add tests for your implementations.readme.txt: a file where you should write up information about your heap allocator designs.my_optional_program.c: an empty C program configured to use your heap allocator - you can optionally add code to this program to try out your finished heap allocators! When you compile this program, it will create 3 versions you can run, one for each allocator type, each namedmy_optional_program_[ALLOCATOR TYPE].samples: a symbolic link to the shared directory for this assignment. It contains:- various
.scripttext files: these files are provided test cases for testing your heap allocators with the providedtest_harnessprogram. See later in the spec for more details. SANITY.iniandsanity.py: files to configure and run sanity check. You can ignore these files.
- various
tools: contains symbolic links to thesanitycheckandsubmitprograms for testing and submitting your work.
A note about the Honor Code: An allocator submission is expected to be your original work. Your code should be designed, written, and debugged by you independently. The B&O textbook chapter 9.9 contains background reading on heap allocators, including sample code. You are allowed to review this code (but note that its structure is somewhat incompatible with our framework, the code is not that readable, and it uses stylistically-poor preprocessor macros). We recommend using this as a starting point, and then writing your own. If you do end up taking some inspiration from this code, you may do so given an appropriate citation. All other code is entirely off-limits -- you must not otherwise study other heap allocator code nor incorporate code from outside sources into your submission nor discuss code-level specifics with others. A student who is retaking the course may not re-use the design, code, or documentation from any previous effort from a previous quarter.
Recommended Approach
The steps below are invaluable to approaching this assignment - we highly recommend following them closely!
Tip: check out the "Code Quality" section for strategies to use to help you write better, cleaner implementations! Also, submit after completing each step below! This way you have incremental submissions as you work.
-
Review the past two lectures on heap allocators, along with chapter 9.9 of the textbook, for resources on heap allocators. The textbook suggests some different design decisions than we will use in lecture and on this assignment - these discrepencies are noted in this spec.
-
Complete the 2 code study exercises to understand how to run and test heap allocators for this assignment, and how to implement a bump allocator. End goal: know how to run various test scripts, understand what the test harness checks for, and understand what functions you must implement in order to create a functional heap allocator.
-
Sketch out the design you will use for your implicit allocator. Don't start coding until you have a full design for how you will implement it! Some parts to think through may include:
- how will you define your block headers?
- how will you ensure that returned blocks are correctly aligned?
- how will you traverse the blocks to find a free block to reuse?
- what common tasks will you perform frequently? How can you structure your code to reduce redundancy?
- how can you use structs and typedefs to help you access your data?
- what checks can you perform in your
validate_heapfunction?
-
Start working on the implicit heap allocator by taking the code from the provided bump allocator and editing
mymallocto place a header on each block. Don't make any other changes - what you will have is a bump allocator with block headers! Test thoroughly usingvalidate_heapby having it traverse those headers from heap start to end. Test on short scripts, then larger ones. End goal: you should have a thoroughly-tested, correct implementation of implicit free list block headers. -
Update the implicit allocator
myfreefunction to mark free block headers as available. Test it by maintaining a global count of in-use and free blocks, and confirm this matches the counts summed when you traverse the blocks invalidate_heap. End goal: you should have a thoroughly-tested, correct implementation of block headers being updated when blocks are freed. -
Update the implicit allocator
mymallocfunction to search for a free block to reuse rather than always allocating memory at the end of the heap. Test it via a sequence of requests to alloc/free of the same size block, which should recycle memory. End goal: you should have a thoroughly-tested, correct implementation of block recycling. -
Finish up any loose ends in your implicit allocator, such as ensuring robustness for edge cases, etc. Use tools to measure performance, such as
callgrindand the test harness, and try to optimize your implementation. End goal: you should have a completed implicit free list allocator! -
Next, turn to your explicit free list allocator. Your code from your implicit allocator is a great starting point to build on, which is why it's vital to have a correct and efficient implementation for that before moving on. Copy your code from your implicit allocator as a starting point - it should pass all the tests out of the gate! (but obviously will not yet implement explicit allocator features). Approach the explicit free list in a similar manner to what we recommend above - sketch out your design, think through various important considerations, and implement it in small, manageable chunks, testing thoroughly at each step.
Assignment Support: Through helper hours, Piazza and email, our focus will be on supporting you so that you can track down your own bugs. Please ask us how to best use tools, what strategies to consider, and advice about how to improve your debugging process, but we expect you to take ownership of your own code and drive your own debugging. For this reason, and also because of the wide variety of implementations possible, while we are happy to answer code-level questions, we will not be able to look at code during helper hours for this assignment. We will also only answer debugging questions if you have implemented your validate_heap function to narrow in on the issue you're having - this function is a huge asset, and we want to encourage you to take advantage of it!
Code Study: Test Harness
In order to put a heap allocator through its paces, you will need a good testing regimen. Our provided test_harness.c and allocator scripts can help!
Allocator Scripts
An allocator script is a file that contains a sequence of requests in a compact text-based format. The three request types are a (allocate) r(reallocate) and f (free). Each request has an id-number that can be referred to in a subsequent realloc or free.
a id-number size
r id-number size
f id-number
A script file containing:
a 0 24
a 1 100
f 0
r 1 300
f 1
is converted into these calls to your allocator:
void *ptr0 = mymalloc(24);
void *ptr1 = mymalloc(100);
myfree(ptr0);
ptr1 = myrealloc(ptr1, 300);
myfree(ptr1);
We provide a variety of different test scripts in the samples folder, named to indicate the "flavor":
- example scripts have workloads targeted to test a particular allocator feature. These scripts are very small (< 10 requests), easily traced, and especially useful for early development and debugging. These are much too small to be useful for performance measurements.
- pattern scripts were mechanically constructed from various pattern templates (e.g. 500 mallocs followed by 500 frees or 500 malloc-free pairs). The scripts make enough requests (about 1,000) that they become useful for measuring performance, albeit on a fairly artificial workload.
- trace scripts contain real-world workloads capturing by tracing executing programs. These scripts are large (> 5,000 requests), exhibit diverse behaviors, and useful for comprehensive correctness testing. They also give broad measurement of performance you might expect to get "out in the wild".
You are also highly encouraged to create your own!
Test Harness
The test_harness.c program is a provided program that can check for correctness with these script files. The harness will parse a script file and translate it into requests. While executing the script, it attempts to verify that each request was correctly satisfied. When you compile using make, it will create 3 different compiled versions of this program to test with, one using each type of heap allocator: test_bump, test_implicit and test_explicit. You can specify one or more script files as command line arguments to the test harness and it will run and evaluate on each of them. Read over test_harness.c to understand how it operates so you can use it to the fullest. You don't need to dig into the parts that parse a script file towards the end, but do carefully review the earlier code that executes each script and checks correctness.
Verify your understanding by checking in with yourself on the following questions:
- When making a
mallocrequest, what checks does the test harness perform to verify the request was properly serviced? What checks does it use onfree? Onrealloc? - How does the test harness verify that no memory blocks overlap one another?
- What value does the test harness write to the payload bytes of an allocated block? For what purpose is it writing to the payload?
- How does the test harness calculate the allocator's average utilization?
- In addition to its own checks to verify constraints that can be externally validated, the test harness also make calls to
validate_heap.validate_heapis a function that you will write in each of your allocators that does a scan of your internal heap data structures (more about how to implement this later). At what point does the test harness callvalidate_heap? How does the test harness respond whenvalidate_heapreturns false? What command-line flag configures the test harness to skip the calls tovalidate_heap? - The test harness calls the allocator's
myinitfunction before executing each script. The function is expected to wipe the allocator's slate clean and start fresh. Ifmyinitdoesn't do its job, you may encounter bugs caused by ghost values left over from the previous script. Keep this in mind for when implementingmyinitin your heap allocator! - The arguments to
myinitare the bounds of the heap segment. Runtest_bumpunder gdb and print out those boundaries. At what address does the heap segment start? What is the heap segment size? Remember these values, you are going to be seeing a lot of activity in this region. - Something that may come in handy is breaking at the test harness's execution of a specific line number in a test script. How can we use conditional breakpoints to do this? Where might be a good place to add a conditional breakpoint that triggers when we execute line X?
If there is anything you don't understand about the test harness, please ask! You want to have a good understanding of how this code operates so you can use it effectively when testing your allocator.
Code Study: Bump Allocator
The bump.c file contains an implementation of a bump allocator. We discussed this strategy in lecture as one of the simplest possible approaches; it is extremely fast, but has very poor memory utilization. The implementation we provide is not robust to all edge cases and does not perform all required error checking, but is provided to give you an idea of how you might go about implementing a heap allocator. Here are a few experiments to try out to observe how it performs:
- Read over the C code for the bump allocator. Disassemble its
mymallocandmyfreefunctions and count the number of assembly instructions. This gives you the static instruction count. - Open the allocator script
samples/pattern-repeat.scriptusingemacsand peruse its contents. The script makes 1000 requests: 500 repeated pairs of a call tomymallocfollowed by a call tomyfree. -
Use the callgrind profiler, discussed in lab8, to examine the dynamic instruction count of the bump allocator when running this script:
valgrind --tool=callgrind --toggle-collect=mymalloc --toggle-collect=myrealloc --toggle-collect=myfree ./test_bump samples/pattern-repeat.scriptWhat is the relationship between the static and dynamic instruction counts? Do you see how they match up?
-
Run the annotator, also mentioned in lab8, on this output file to see the C source labeled with the instruction counts.
callgrind_annotate --auto=yes callgrind.out.pidNo part of the bump allocator code will stand out as a particular "hot spot". The operations are all pretty streamlined.
-
Review the C code for
myreallocand its generated assembly. The allocator scriptsamples/pattern-realloc.scriptmakes 1000 requests: 100 calls tomymalloccalls and 900 calls tomyrealloc. Use the same profile/annotate steps on this script file. This annotation will report a definitive "hot spot" taking a large number of instructions. What is that time-consuming operation? What makes it so expensive? - The bump allocator has no per-block memory overhead, but its inability to recycle memory means its utilization is expected to be abysmal. What does the test harness report as the calculated utilization when servicing these allocator scripts?
The bump allocator can service a malloc/free in just a handful of instructions, but its utilization is extremely poor. In your own allocators, you will aim to strike a balance of good results on both utilization and speed rather than sacrificing one for the other.
Implementing Your Own Allocators
Now it's your turn! Your main task on this assignment is to implement your own implicit and explicit allocators as discussed in class and in the textbook.
General Requirements
The following requirements apply to both allocators:
- The interface should match the standard libc allocator. Carefully read the
mallocman page for what constitutes a well-formed request and the required handling to which your allocator must conform. Note there are a few oddballs (malloc(0)and the like) to take into account. Ignore esoteric details from the NOTES section of the man page. - There is no requirement on how to handle improper requests. If a client reallocs a stack address, frees an already freed pointer, overruns past the end of an allocated block, or other such incorrect usage, your response can be anything, including crashing or corrupting the heap. We will not test on these cases.
- An allocated block must be at least as large as requested, but it is not required to be exactly that size. The maximum size request your design must accommodate is specified as a constant in
allocator.h. If asked for a block larger than the max size, your allocater can return NULL, just as it would for any request that cannot be serviced. You should not assume the value of this constant beyond that it will not be larger than the max value of asize_t. Every allocated block must be aligned to an address that is a multiple of theALIGNMENTconstant, also inallocator.h. You may assume that this value is 8, and it is fine for your code to work only with an alignment of 8. However, you should still use the constant instead of hardcoding the value. This alignment applies to payloads that are returned to the client, not to internal heap data such as headers. It's your choice whether small requests are rounded up to some minimum size. myinit's job is to properly initialize the allocator's state. It should returntrueif the initialization was successful, orfalseif the parameters are invalid / the allocator is unable to initialize. For the parameters passed in, you may assume that the heap starting address is a non-NULL value aligned to theALIGNMENTconstant, and that the heap size is a multiple ofALIGNMENT. You should not assume anything else about the parameters, such as that the heap size is large enough for the heap allocator to use.- Your allocator cannot invoke any memory-management functions. By "memory-management" we specifically mean those operations that allocate or deallocate memory, so no calls to
malloc,realloc,free,calloc,sbrk,brk,mmap, or related variants. The use of other library functions is fine, e.g.memmoveandmemsetcan be used as they do not allocate or deallocate memory. - Your allocator may use a small amount of static global data, limited to at most 500 bytes. This restriction dictates the bulk of the heap housekeeping must be stored within the heap segment itself.
- The test harness looks for various externally-visible problems (blocks that are unaligned/overlapping/outside segment, failure to preserve payload data, etc); your allocator should have no operational or execution errors on any of the sample scripts.
- You must have an implemented
validate_heapfunction that thoroughly checks the heap data structures and state and returns whether or not any problems were found (see next section).
validate_heap
validate_heap is called by the test harness periodically to check for issues. It is a crucial way to reliably distinguish between a valid and invalid heap configuration and pinpoint the exact moment the heap got out of whack. Instead of trying to manually retrace the steps for how a corrupted heap got that way, you can instead jump right in and trace the critical call in context to observe exactly how and when things went astray. When implementing validate_heap, don't repeat the checks the test harness already does for you, but instead augment them by reviewing the internal consistency of the heap. You should write your implementation of validate_heap as you implement each allocator feature - you should not just go back and add it at the end! As your heap data structures become more complex, validate_heap should become more sophisticated to match them.
To provide some examples, your validate_heap might walk the entire heap and verify such things as:
- Does the housekeeping information (location, size, free status) for each block appear reasonable? Is all of the heap segment accounted for?
- Is every block in the free list marked as free? Is every block marked free listed on the free list? Is each listed only once?
- Have adjacent free blocks been coalesced according to your policy?
- Are redundancies in the data structures consistent, e.g. does the count of free blocks match the length of the free list, or the total bytes in-use match the sum of in-use block sizes?
A strong implementation looks for potential issues and reports only on failures. You should not, for example, dump a printout of the entire heap to then manually look through for potential issues.
Tip 1: We provide a breakpoint() function (see debug_break.h) that will force a stop in the debugger. You may want to call this function from your validate_heap when it detects something is incorrect.
Tip 2: The more comprehensive and thorough checks done by validate_heap, the more useful it will be to you. A good validate_heap will also be slow -- do not be at all concerned about this, this function is a development/debugging aid, there is no desire for it to be efficient and it will not be invoked during any performance measurements.
1) Implement An Implicit Free List Allocator
Now you're ready to implement your first heap allocator design. The specific features that your implicit free list allocator must support are:
- Headers that track block information (size, status in-use or free) - you must use the header design mentioned in lecture that is 8 bytes big, using any of the least significant 3 bits to store the status
- Free blocks that are recycled and reused for subsequent malloc requests if possible
- A malloc implementation that searches the heap for free blocks via an implicit list (i.e. traverses block-by-block).
Your implicit free list allocator is not required to:
- implement any coalescing of freed blocks (the explicit allocator will do this)
- support in-place realloc (the explicit allocator will do this); for realloc requests you may satisfy the request by moving the memory to another location
- use a footer, as done in the textbook
- resize the heap when out of memory. If your allocator runs out of memory, it should indicate that to the client.
This allocator won't be that zippy of a performer but recycling freed nodes should certainly improve utilization over the bump allocator.
The bulleted list above indicates the minimum specification that you are required to implement. Further details are intentionally unspecified as we leave these design decisions up to you; this means it is your choice whether you search using first-fit/next-fit/best-fit/worst-fit, and so on. It can be fun to play around with these decisions to see the various effects. Any reasonable and justified choices will be acceptable for grading purposes.
2) Implement An Explicit Free List Allocator
Building on what you learned from writing the implicit version, use the file explicit.c to develop a high-performance allocator that adds an explicit free list as specified in class and in the textbook and includes support for coalescing and in-place realloc.
The specific features that your explicit free list allocator must support:
- Block headers and recycling freed nodes as specified in the implicit implementation (you can copy from your implicit version)
- An explicit free list managed as a doubly-linked list, using the first 16 bytes of each free block's payload for next/prev pointers
- Note that the header should not be enlarged to add fields for the pointers. Since the list only consists of free blocks, the economical approach is to store those pointers in the otherwise unused payload.
- Malloc should search the explicit list of free blocks
- A freed block should be coalesced with its neighbor block to the right if it is also free. Coalescing a pair of blocks must operate in O(1) time. You should perform this coalescing when a block is freed.
- Realloc should resize a block in-place whenever possible, e.g. if the client is resizing to a smaller size, or neighboring block(s) to its right are free and can be absorbed. Even if an in-place realloc is not possible, you should still absorb adjacent free blocks as much as possible until you either can realloc in place, or can no longer absorb and must reallocate elsewhere.
Your explicit free list allocator is not required to (but may optionally):
- coalesce free blocks to the left or otherwise merge blocks to the left
- coalesce more than once for a free block. In other words, if you free a block and there are multiple free blocks directly to its right, you are only required to coalesce once for the first two. However, for realloc you should support in place realloc which may need to absorb multiple blocks to its right.
- resize the heap when out of memory. If your allocator runs out of memory, it should indicate that to the client.
This allocator should result in further small gains in utilization and a big boost in speed over the implicit version. You will end up with a pretty snappy allocator that is also strong on utilization -- an impressive accomplishment!
The bulleted list above indicates the minimum specification that you are required to implement. Further details are intentionally unspecified as we leave these design decisions up to you; this means it is your choice whether you search using first-fit/next-fit/best-fit/worst-fit, and so on. It can be fun to play around with these decisions to see the various effects. Any reasonable and justified choices will be acceptable for grading purposes.
Performance and Efficiency
Note: do not use Valgrind memtool (the memory tool we have used up to this point) on this assignment. It looks only at the standard heap allocator, not your custom one, and its complaints may needlessly worry you. You can run the test harness under Valgrind callgrind (the profiling tool) for measuring instruction counts.
Utilization
You can achieve higher utilization by more effectively packing blocks into the heap segment so as to accommodate more payload data and correspondingly have less overhead/fragmentation. The test harness computes the utilization per-script and reports the average utilization over a set of scripts. The per-script utilization is calculated as the ratio of the peak payload divided by the total segment space used to accommodate that payload data. Utilization ranges from 0% to 100%. The achieved utilization depends on the script; a certain tough script might have utilization down near 30%, while another script might achieve a near perfect packing of 95%. If utilization averages out at > 50% overall, you're doing great!
Instruction Count
Another important goal for an allocator is that it services requests quickly. Callgrind is just the tool for the job; it allows you to count the number of instructions executed (lower is better).
To get measurements for your allocator, run the test harness under callgrind and use the toggle-collect flag to limit the data collection to your allocator functions like this:
valgrind --tool=callgrind --toggle-collect=mymalloc --toggle-collect=myrealloc --toggle-collect=myfree ./test_bump samples/pattern-mixed.script
The above command runs the bump allocator on the named script and asks callgrind to count the total number of instructions executed within the my- functions. It will write a callgrind.out.pid (pid is the number of the process) file to be used with callgrind_annotate to get fine-grained information about the work done per-function and per-line.
callgrind_annotate --auto=yes callgrind.out.pid
Performance
What are some of the expectations about the performance for the different allocators?
Bump. Given it has no variation in the work it does to service a request, both malloc and free are O(1) for bump. We clocked the bump allocator processing each malloc in about 35 instructions at -O0 (and reduced to just under 10 when compiled -O2, wow!). These counts are untouchably speedy, but the consequence of its fast service is extremely poor utilization.
Implicit free list. To service a malloc request, this allocator searches the implicit list. Instruction counts will vary depending on how full the heap is and where the free blocks are found. In the best case scenario (free block found right away), a malloc request can be serviced in about 40 instructions. In the worst-case (search most/all of heap to find block), the count rises linearly relative to the number of blocks in heap. Searching a heap of 1000 blocks can take many thousands of instructions -- ouch! free is a quick constant-time operation, accounting for about 10 instructions per call. The implicit free list can achieve average utilization in the most satisfying 60% range over the pattern/trace scripts, as these workloads benefit from the recycling that implicit does. Hitting above 50% utilization is a very respectable achievement for any allocator!
Explicit free list. In its best-case scenario (free block found right away), malloc can be about 50 instructions. In the worst-case, search bogs down to O(N), but this is linear in the number of free blocks, not the total number of blocks. Coalescing a pair of blocks should be O(1), likely on the order of about 30 instructions. This work may be counted in your free/malloc/realloc depending on your design. For some workloads, the more aggressive recycling will lead to even higher utilization for explicit than implicit, but on average, it will be in a comparable range.
Realloc is a slippery operation to measure. If a block is being resized in place, it can be fast to service (about 30 instructions), but otherwise it incurs sum of counts for malloc+memcpy+free. Copying the payload data likely dominates the total cost, so the count of instructions varies based on block size.
The numbers above give you a general sense of what is possible, but are not intended as absolute requirements. These counts were taken from a simple implementation of the minimum requirements with little effort at optimizing other than compiling at -O2. Your instruction counts might be a bit higher, and that's fine. If you work to get your counts lower, even better!
Note: Rather than get hung up on absolute numbers, focus your attention instead on what you can learn from relative comparisons. For example, comparing the performance of two different allocators on the same workload allows you to observe the contrast between two approaches. Comparing an allocator to itself before/after a design/code change allows you to measure the impact of that change. Comparing an allocator to itself on a variety of workloads shows its strengths/weakness in different scenarios.
Compiler Optimizations
Once you have completed and thoroughly tested your implementation, feel free to change the top part of the Makefile to tell gcc to more aggressively optimize your code. While still in active development, using -O0 generates code that is much easier to debug, but once you're working on performance, experiment with more aggressive levels and options, such as -O2. We will compile your allocator using the setting from your submitted Makefile, so be sure to submit with the flags configured to work best for your code. Remember, however, that optimizing transformations make the assembly much less of a literal rendition of the C, so stepping through it may appear to jump through the code erratically and show unexpected results.
Code Quality
- Write helper functions. There will be a few vital expressions that are heavily-repeated, such as going back and forth between a payload pointer and its header or advancing from one header to the next. Even though these expressions are one-liners, they are complicated, dense, pointer-laden one-liners. Move these expressions into tiny shared helpers so you can write them correctly once!
- Pointee types should communicate the truth. If you know the type of a pointer, declare it with its specific type. Use
void*only if the pointee type is not known or can vary. Conversely, don't declare what should be avoid*pointer with a specific pointee type, as this misleads the reader and compiler into thinking it's something that it's not. - Structs and typedefs are your friend. defining structs for groups of data can help make your code cleaner and easier to read. Plus, you can cast to a struct to make it easier to manipulate data. Moreover, you can use
typedefby itself to declare a custom variable type that is really something else, e.g.typdef int custom_type;, to declarecustom_typethat is really anint. - Be wary of typecasts. Typecasts are not to be thrown about indiscriminately. Your typecast defeats the type system and coerces the compiler to go along with it without complaint. Before introducing one, consider: why is this cast necessary? what happens if it is not present? is there a safer way to communicate my intention? how might I work within the type system rather than against it?
- Decompose for the win. A page-long malloc or realloc can be overwhelming. Break it down into tasks: finding a block, updating the housekeeping, etc.
- Names matter. Choose good names for your fields and functions. A vague name such as
sizeforces you to repeatedly ponder whether it tracks the payload size or total block size (included the header), and what units it uses (bytes? etc.). Always use descriptive variable names. - Avoid co-mingling of signed/unsigned types. The standard malloc interface takes its arguments as a
size_t, which is anunsigned long. Careless co-mingling of signed and unsigned types and being sloppy about bitwidth can lead to unexpected troubles (as you saw in the recent assignment), so be thoughtful and consistent. - Coding for efficiency. Our grading priorities typically de-emphasize efficiency. We designate a few points for meeting the "reasonable efficiency" benchmark and focus more on hacky algorithms, code duplication, lack of decomposition, unnecessary special cases, and other choices that make for ugly code. The priorities shift somewhat when efficiency is in play. The ideal is streamlined code that is also clean and high-quality, but should you have to sacrifice a bit on quality in the name of performance, only do so if the performance gain you're getting is measurably important and there is no elegant way to get the same boost.
Readme
In your readme, we ask that you write a paragraph about each of your two allocators (implicit and explicit) . Describe the design decisions you made for the allocator and why you made those choices. Summarize its overall performance characteristics of the allocator and identify its strengths and weaknesses. Tell us briefly about how you attempted to optimize your allocator. Share an amusing anecdote or interesting insight that arose while designing, developing, testing, debugging, or optimizing that allocator!
As you finish your readme, we have one final question to ask you:
What would you like to tell us about your quarter in CS107? Of what are you particularly proud: the effort you put into the quarter? the results you obtained? the process you followed? what you learned along the way? Tell us about it! We are not scoring this question, but wanted to give you a chance to share your closing thoughts before you move on to the next big thing.
Sanity Check
Custom sanity check is configured to gather instruction count statistics. It runs the allocator under callgrind to get the total instruction count and divides by the number of requests to report the average instruction count per request. In your custom_tests file, each line should be an allocator test on a script:
test_implicit samples/pattern-recycle.script
test_explicit samples/pattern-recycle.script
Running tools/sanitycheck custom_tests will report the average instruction count for each such test listed in your custom_tests file.
Submitting
Once you are finished working and have saved all your changes, check out the guide to working on assignments for how to submit your work. We recommend you do a trial submit in advance of the deadline to allow time to work through any snags. You may submit as many times as you would like; we will grade the latest submission. Submitting a stable but unpolished/unfinished version is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, this previous submit will earn points. Without a submission, we cannot grade your work.
There is no grace period for this assignment.
We would also appreciate if you filled out this homework survey to tell us what you think once you submit. We appreciate your feedback!
Grading
Here is the tentative point breakdown for this assignment:
Functionality (112 points)
- Sanity/sample scripts (35 points) Correct results on the default sanity check tests and published sample scripts.
- Comprehensive/performance (55 points) Fulfillment of the specific implicit/explicit requirements and correct servicing of additional sequences of mixed requests. The performance expectations are fairly modest and intended to verify that the allocator correctly implements the required features.
- Robustness (20 points) Handling of required unusual/edge conditions (request too large, malloc 0 bytes, etc.)
- Clean compile (2 points) We expect your code to compile cleanly without warnings.
Readme questions
- readme.txt. (10 points) The readme questions will be graded on the thoughtfulness and completeness of your answers.
Code quality (buckets weighted to contribute ~20 points)
- We expect your allocator code to be clean and readable. We will look for descriptive names, helpful comments, and consistent layout. We expect your code to show thoughtful design and appropriate decomposition. Control flow should be clear and direct. We expect common code to be factored out and unified. Complex tasks or tricky expressions that are repeated should be decomposed into shared helpers. See the style guide on the assignments page for more information.
- The code review will also assess the thoroughness and quality of your
validate_heaproutines.
Post-Assignment Check-in
How did the assignment go for you? We encourage you to take a moment to reflect on how far you've come and what new knowledge and skills you have to take forward. Once you finish this assignment, you will have put all your hard work this quarter in CS107 towards implementing a crucial tool in program execution, and one that at the start of the quarter you took for granted. Completing this assignment requires use of pointer skills, memory manipulation, assembly optimization, and more - an amazing accomplishment. It's definitely time to celebrate! Congratulations on an awesome quarter!
To help you gauge your progress, for each assignment/lab, we identify some of its takeaways and offer a few thought questions you can use as a self-check on your post-task understanding. If you find the responses don't come easily, it may be a sign a little extra review is warranted. These questions are not to be handed in or graded. You're encouraged to freely discuss these with your peers and course staff to solidify any gaps in you understanding before moving on from a task. They could also be useful as review before the exams.
- What are the main challenges in making malloc fast? What makes for a fast free?
- Explain the difference between internal and external fragmentation. Which is the greater threat to utilization?
- True or False: If you have built a super-fast malloc and free, implementing realloc in terms of those operations will also be super-fast.
- Throughput and utilization are often viewed in opposition. How might improved throughput come at the expense of lowered utilization (or vice versa). Are there choices that lead to wins in both?
Frequently Asked Questions (FAQ)
Is there a minimum payload size?
No. For some allocator designs, it will be convenient to round up small requests to a minimum payload of 8 or 16 bytes, but there is no requirement to do so.
Some of my heap housekeeping seems to get randomly overwritten with a repeated pattern, such as 0x2a2a2a2a. What gives?
After allocating a block, the test harness memsets a pattern over the entire payload, e.g. repeats 0x2a or 0x08 in every payload byte. (That byte is the request id in the script being processed). If this pattern appears where you think it should not, this means there are bytes that test harness believes is in-use payload yet your allocator thinks those bytes store heap housekeeping. Perhaps an in-use block was incorrectly left on the free list, a block coalesce/split was bungled, or a block is wrongly both in-use and free? Identifying when the corruption came about sounds like a job for validate_heap!
My allocator gets a segmentation fault dereferencing an invalid address, but when I access that same address within gdb, it seems fine. Why?
gdb doesn't heed the protection bits on memory segments, so it can access read-protected locations without complaint, while accessing these same locations in a running program will cause a segmentation fault. If your allocator is faulting due to trying to access a read-protected location, it could be due to accessing part of the heap segment that was previously mapped but has since been marked unreadable. This symptom is highly suggestive that your myinit doesn't correctly re-set the state of the allocator back to a clean slate.