Assignment 7 advice

Written by Julie Zelenski

The importance of validate_heap

One of your jobs is to implement the function validate_heap to scrutinize your heap data structures and squawk loudly should any problem be found. To check correctness, the test harness makes a call to validate_heap after every allocator request. If you implement validate_heap to reliably distinguish between a valid and invalid heap configuration, the monitoring from the test harness can pinpoint the exact moment the heap got out of whack. Instead of trying to postmortem a corrupted heap with wild guessing about how and when it got that way, you can instead jump right in and trace the critical call in context to observe exactly how and when things went astray. As a debugging aid this is priceless, but depends on you yourself having usefully implemented validate_heap!

Review the code in test_harness.c to see its external checks of heap correctness (e.g. one check verifies addresses are aligned and within heap, another verifies allocated blocks don't overlap). You don't need to repeat these checks, instead validate_heap will augment them by reviewing the internal consistency of the heap. The implementation of validate_heap should be written alongside your allocation functions. As your heap data structures become more complex, validate_heap becomes more sophisticated to match them.

For example, your validate_heap might walk the entire heap and verify such things as:

  • Does the housekeeping information (location, size, free status) for each block appear reasonable? Is all of the heap segment accounted for?
  • Is every block in the free list marked as free? Is every block marked free listed on the free list? Is each listed only once?
  • Have adjacent free blocks been coalesced according to your policy?
  • Are redundancies in the data structures consistent, e.g. does count of free blocks match length of free list, total bytes in-use match sum of in-use block sizes?

Do not have validate_heap spew a dump of the entire heap for you to wade through in an attempt to manually validate the contents. Running commentary about things that are working fine is of no value and can obscure issues that require attention. Instead have validate_heap traverse the structures, look for trouble, and report only on failures (e.g an in-use block is found on the freelist). In this way, validate_heap will directly bring your attention to exactly those issues that require your intervention and otherwise keep a silent vigilance. We provide a breakpoint() function (see debug_break.h) that will force a stop in the debugger. You may want to call this function from your validate_heap when it detects something is out of whack so as to draw your attention to the problem and give you a chance to further poke around.

The more comprehensive and thorough checks done by validate_heap, the more useful it will be to you. A good validate_heap will also be slow -- do not be at all concerned this, this function is a development/debugging aid, there is no desire for it to be efficient and it will not be invoked during any performance measurements.

It can be very difficult to try to debug an allocator without decent validation, so do yourself a favor and invest in yours from the get-go. If you somehow manage to get your allocator working without one and then throw together a lame version on your way to submit to appease us in grading, you will have missed the entire point.

Strategies and code quality

  • Develop and test incrementally. Your work is made so much harder if you attempt to write/test/debug the entire thing in one go! Instead, code and test in small, incremental steps. Here is an example plan:

    • Start on implicit by taking the code from bump allocator and edit mymalloc to just place a header on each block. You can still allocate each block at the heap end like bump and leave myrealloc/myfree as-is, no recycling yet. What you have is a bump allocator with a block header. Implement validate_heap to traverse those headers from heap start to end. Test on trivial scripts, then larger ones. Don't move on until you have confirmed your headers are properly maintained.
    • Identify your next small step, say, having myfree update the header to mark available. How might you test this? Maintain a running count of in-use and free blocks. If that matches the counts summed when traversing blocks in validate_heap, you're good to move on!
    • Next step could be to rework mymalloc to look for a free block to reuse rather than tack onto the heap end. Verify a sequence of requests to alloc/free the same size block will recycle memory.
    • And now you're basically done with implicit!
    • Your strategy for explicit should breakdown the tasks in stepwise fashion. At each step, keep the amount of code that you are actively working to a small, manageable chunk and thoroughly test before moving on to the next.
  • Write helper functions. There will be a few vital expressions that are heavily-repeated, such as going back and forth between a payload pointer and its header or advancing from one header to the next. Even though these expressions are one-liners, they are complicated, dense, pointer-laden one-liners that you don't want to have to repeatedly stare down to ensure it's correct each time. Move these expressions into tiny shared helpers. Write it once, get it right, and then use the helper everywhere for readability and reliability. If it was tricky to write once, you surely don't want to write it more than once!

  • Pointee types should communicate the truth. If you know the type of a pointer, by all means, declare it with its specific type. Use void* only if the pointee type is not known or can vary. Conversely, don't declare what should be a void* pointer with a specific pointee type. Declaring a void* as a char* misleads the reader into thinking this pointer is a C-string and allows it to be mis-used as such with no compiler warning.
  • Be wary of typecasts. Typecasts are not to be thrown about indiscriminately. Your typecast defeats the type system and coerces the compiler to go along with it without complaint. With this power comes great responsibility. Before introducing one, consider: why is this cast necessary? what happens if it is not present? is there a safer way to communicate my intention? how might I work within the type system rather than against it? (i.e. correcting a mismatch in types rather than covering up with a cast). You also never need to cast anything to a void*, since it is the universal recipient and is compatible with any pointer type (such as cast is redundant and can be removed). Add typecasts exactly and only where you must. If your code repeatedly uses an expression that contains a necessarily unavoidable typecast, consider placing that expression into a shared helper to be invoked everywhere rather than repeat it.
  • Decompose for the win. A page-long malloc or realloc can be dense code that makes your head swim. Break it down into tasks: finding a block, update the housekeeping, etc. If needed, use inlining to keep down costs of function calls. (See gcc docs on inline functions)
  • Names matter. Choose good names for your fields and functions. A vague name such as size forces you to repeatedly ponder whether it tracks the payload size or total block size (included header). It also doesn't include any notion of units -- is that size expressed in number of bytes or number of words or something else? A name that clearly and unambiguously communicates its purpose will be a great help.
  • Avoid co-mingling signed/unsigned. The standard malloc interface takes its arguments as a size_t, which is an unsigned long. Careless co-mingling signed and unsigned types and being sloppy about bitwidth can lead to unexpected troubles (as you saw in the recent assignment), so be thoughtful and consistent.
  • Coding for efficiency. Our grading priorities typically de-emphasize efficiency. We designate a few points for meeting the "reasonable efficiency" benchmark and are quick to ding for hacky algorithms, code duplication, lack of decomposition, unnecessary special cases, and other choices that make for ugly code. The priorities shift somewhat when efficiency is in play. The ideal is streamlined code that is also clean and high-quality, but should have you to sacrifice a bit on quality in the name of performance, only do so if the performance gain you're getting is measurably important and there is no elegant way to get the same boost.

Allocator scripts

An allocator script file contains a sequence of requests in a simple text-based format. The format is documented in the assignment writeup. The test harness executes an allocator script and evaluates the correctness of the result. Measuring the instruction count and utilization when executing a script allows you to evaluates the allocator performance on that particular workload.

Sample scripts

The sample scripts we provide are each named to indicate its "flavor":

  • example scripts have workloads targeted to test a particular allocator feature. These scripts are very small (< 10 requests), easily traced, and especially useful for early development and debugging. These are much too small to be useful for performance measurements.
  • pattern scripts were mechanically constructed from various pattern templates (e.g. 500 mallocs followed by 500 frees or 500 malloc-free pairs). The scripts make enough requests (~1,000) that they become useful for measuring performance, albeit on a fairly artificial workload.
  • trace scripts contain real-world workloads capturing by tracing executing programs. These scripts are large (> 5,000 requests), exhibit diverse behaviors, and useful for comprehensive correctness testing. They also give broad measurement of performance you might expect to get "out in the wild".

Correctness

The test harness looks for various externally-visible problems (blocks that are unaligned/overlapping/outside segment, failure to preserve payload data, etc); your allocator should have no operational or execution errors on any of the sample scripts.

Utilization

Higher utilization is achieved by more effectively packing blocks into the heap segment so as to accommodate more payload data and correspondingly less overhead/fragmentation. The test harness computes the utilization per-script and reports the average utilization over a set of scripts. The per-script utilization is calculated as the ratio of the peak payload divided by the total segment space used to accommodate that payload data. Utilization varies ranges from 0% to 100%. The achieved utilization depends on how challenging or easy the specific workload is for your allocator. A certain tough script might have utilization down in the 30%, another script might achieve a near perfect packing of 95%. If utilization averages out at > 50% overall, you're doing great!

Instruction count

Another important goal for an allocator is that it service requests quickly. In order to know how your allocator is doing on that front, you need to take measurements. Callgrind is just the tool for the job. Using callgrind allows you to count the number of instructions executed (lower is better)

To get measurements for your allocator, run the test harness under callgrind and use the toggle-collect flag to limit the data collection to functions whose name begins with "my" like this:

valgrind --tool=callgrind --toggle-collect='my*' ./test_bump samples/pattern-mixed.script

The above command runs the bump allocator on the named script and asks callgrind to count the total number of instructions executed within the my functions. It will write a callgrind.out.pid (pid is the number of the process) file to be used with callgrind_annotate to get fine-grained information about the work done per-function and per-line.

callgrind_annotate --auto=yes callgrind.out.pid

Performance expectations

What are some of the expectations about the performance for the different allocators?

Bump. Given it has no variation in the work it does to service a request, both malloc and free are O(1) for bump. We clocked the bump allocator processing each malloc in ~35 instructions at -O0 (and reduced to just under 10 when compiled -O2, wow!). These counts are untouchably speedy, but the consequence of its fast service is pathetically low utilization.

Implicit free list. To service a malloc request, this allocator searches the implicit list. Instruction counts will vary depending on full the heap is and where the free blocks are found. In the best case scenario (free block found right away), a malloc request can be serviced in ~40 instructions. In worst-case (search most/all of heap to find block), count rises linearly relative to number of blocks in heap. Searching a heap of 1000 blocks can take many thousands of instructions -- ouch! free is a quick constant-time operation, accounting for ~10 instructions per call. The implicit free list can achieve average utilization in the most satisfying ~60% range over the pattern/trace scripts, as these workloads benefit from the recycling that implicit does. Hitting above 50% utilization is a very respectable achievement for any allocator!

Explicit free list. In its best-case scenario (free block found right away), malloc can be some ~50 instructions. In worst-case, search bogs down to O(N), but this is linear in the number of free blocks, not total number of blocks. Coalescing a pair of blocks should be O(1), likely on the order of ~30 instructions. This work may be counted in your free/malloc/realloc depending on your design. For some workloads, the more aggressive recycling will lead to even higher utilization for explicit than implicit, but on average, it will be in a comparable range.

Realloc is a slippery operation to measure. If a block is being resized in place, it can be fast to service (~30 instructions), but otherwise it incurs sum of counts for malloc+memcpy+free. Copying the payload data likely dominates the total cost, the count of instructions varies based on block size.

The numbers above give you a general sense of what is possible, but are not intended as absolute requirements. These counts were taken from a simple implementation of the minimum requirements with little effort at optimizing other than compiling at -O2. Your instruction counts might be a bit higher and that's fine. If you work to get your counts lower, even better! There is lots of room for you to show us up!

Think relative, not absolute

Rather than get hung up on absolute numbers, focus your attention instead on what you can learn from relative comparisons. For example, comparing the performance of two different allocators on same workload allows you to observe the contrast between two approaches. Comparing an allocator to itself before/after a design/code change allows you to measure the impact of that change. Comparing an allocator to itself on a variety of workloads shows its strengths/weakness in different scenarios.

Common questions about assign7


Are we allowed to adopt the allocator code from the textbook? Are we allowed to adopt code from elsewhere?

We plan for you to invest your hard-earned CS107 superpowers in your own creation. The discussion of heap allocator design in B&O Ch 9 is quite thorough and you should absolutely carefully read the chapter to get your bearings, but the specific code given in 9.9 is not that pleasing. Its structure is somewhat incompatible with our framework, the code is not that readable, and its clumsy use of preprocessor macros is to be avoided. Our recommendation is that you read the code as a starting point, but plan to write your own. if you do end up taking some inspiration from this code, we have a specific carve-out that allows you to do so that given an appropriate citation. All other code is entirely off-limits -- you must not review or adopt code for your allocator from any other resource.

What does it mean that every address must be aligned to a multiple of 8?

Each payload must be placed such that its address has last hex digit of 0 or 8. Addresses 0xa5b8 and 0xa5b0 are aligned; 0xa5b4 or 0xa5b1 are not. The alignment requirement applies to the base address of the payload; whatever headers/footers you have placed around it are your own business. The textbook indicates the alignment multiple is 16 for 64-bit code, but we are aligning to 8, the size of largest primitive type. You can choose to align to 16 (or any larger multiple of 8) for convenience if you like.

What is the largest payload size the allocator must handle?

The theoretical maximum request for a 64-bit address space is beyond enormous but in practice, the max size will be constrained by other in-use segments. Your design can plan that max request size tops out at 1 << 30 bytes. A request larger than max can be returned unserviced (i.e. return NULL).

Is there a minimum payload size?

No. For some allocator designs, it will be convenient to round up small requests to a minimum payload of 8 or 16 bytes, but there is no requirement to do so.

My validate_heap routine is very slow. Should I be concerned?

No. In order to do its job well, validate_heap should be thorough and comprehensive, which requires time to execute. This function is a development/debugging aid, there is no desire for it to be efficient and it will not be invoked during any performance measurements. Look back to the code study to remind yourself of how test harness can be invoked to skip the validate step when desired.

I'm getting an ALLOCATOR_FAILURE from the test harness. What does this mean?

The test harness has some simple checks to look for heap trouble. It checks that each malloc/realloc call returns an address within the heap segment that is correctly aligned and the new block doesn't overlap any other currently in-use block. It writes a identifying byte to the payload an allocated block. When asked to realloc/free a block, it verifies the payload contents have remained intact. It also calls your validate_heap after each request. If any of these checks turn up trouble, it reports an ALLOCATOR_FAILURE to draw attention to the problem. Review the code in test_harness.c to understand the symptom for the particular error being reported to you.

Some of my heap housekeeping seems to get randomly overwritten with a repeated pattern, such as 0x2a2a2a2a. What gives?

After allocating a block, the test harness memsets a pattern over the entire payload, e.g. repeats 0x2a or 0x08 in every payload byte. (That byte is the request id in the script being processed). It does not write to any bytes before/after the block, only on the payload bytes. If this pattern appears where you think it should not, this means there are bytes that test harness believes is in-use payload yet your allocator thinks those bytes store heap housekeeping. Perhaps an in-use block was incorrectly left on the free list, a block coalesce/split was bungled, a block is wrongly both in-use and free? Identifying when the corruption came about sounds like a job for validate_heap!

When I valgrind the test harness, it warns about "set address range perms: large range". What gives?

The default Valgrind tool (memtool) is designed to spy on the actions of the real malloc/free and will be paying no attention to what you are doing over in your own heap segment. Using memtool in this case is not only unhelpful, its complaints may needlessly worry you. Do not use Valgrind memtool on this assignment. You can run the test harness under Valgrind callgrind (the profiling tool) for measuring instruction counts.

My allocator gets a segmentation fault dereferencing an invalid address, but when I access that same address within gdb, it seems fine. Why?

gdb doesn't heed the protection bits on memory segments, so it can access read-protected locations without complaint, while accessing these same locations in a running program will cause a segmentation fault. If your allocator is faulting due to trying to access a read-protected location, it could be due to accessing part of the heap segment that was previously mapped but has since been marked unreadable. This symptom is highly suggestive that your myinit doesn't correctly re-set the state of the allocator back to a clean slate.

GDB seems confused about my code; control flow jumps around, variables are missing. What gives?

Are you attempting to debug on optimized code? The debugger will do its best, but the optimizing transformations make the assembly much less of a literal rendition of the C, so stepping may appear to jump through the code erratically, statements may have been elided, variables optimized out of existence so unknown to gdb, and so on. You'll have your best luck doing your debugging on code compiled with little or no optimization. Some bugs only show up when compiled with optimization, in which case you have no choice. In those cases, it may be more helpful to follow the asm than try to work from the C code. You can set the optimization level by editing CFLAGS in the Makefile.

My allocator works on all scripts except for <insert-some-case-here> (e.g. crashes on line 1428 of binky.script). Can someone please look at my code and find what's wrong or tell me how to fix?

Sounds like you have a bug! Don't look for someone else to wave a magic wand, you've got this! Put to work those fabulous debugging chops you have been building up all quarter. Get that program under your gdb microscope, devise experiments to run, gather data, draw pictures, disassemble, poke around in memory, and trace to find how and where your code is going astray. One particularly useful leverage point is the validate_heap function (See section above on "The importance of validate_heap").

In office hours/forum/email, our focus will be on supporting you so that you can track down your own bug. Please do ask us how to best use tools, what strategies to consider, and advice about how to improve your debugging process, but expecting someone else to diagnose your bug by description/symptom? Not so much. The staff will not look at your allocator code nor attempt to trace/review/debug it by inspection. We expect you to take ownership of your own code and drive your own debugging. You're ready!

What gcc flags are allowed?

Anything goes. Edit CFLAGS in the Makefile to configure your desired options. While still in active development, using -0O generates code that is much easier to debug, but once you're working on performance, experiment with more aggressive levels and options. We will compile your allocator using the setting from your submitted Makefile, so be sure to submit with the flags configured to work best for your code.

How can I tell what a given script does?

Open the script file in your text editor and take a look! A script may begin with a comment that summarizes the scenario presented by this script. You can also peruse the entire sequence to see the pattern of requests in more detail.

Can I write my own test scripts?

Yes! We encourage you to write your own scripts in extend your testing. Use our scripts as a template. You can make your own simple scripts using a text editor and generate larger, complex scripts programmatically. (We created our pattern scripts using python scripts.)

Can I use custom sanity check to measure performance?

Yes! Custom sanity check is configured to gather instruction count statistics. It runs the allocator under callgrind to get the total instruction count and divides by the number of requests to report the average instruction count per request. In your custom_tests file, each line should be an allocator test on a script:

test_implicit samples/pattern-recycle.script 
test_explicit samples/pattern-recycle.script

Running tools/sanitycheck custom_tests will report the average instruction count for each such test listed in your custom_tests file.