Assignment 7 advice

Written by Julie Zelenski

Choosing your design

You are to implement a correct allocator that runs efficiently, but the what/how of that implementation is completely open-ended. Unlike our previous assignments which have dictated nearly every detail, this project is yours to spec. There is no task list of what to implement next, no mandate about how to structure the code, no prescribed design to guide you, just lots of open questions and tradeoffs to weigh. The experience can be liberating, but also a bit disorienting. Here are just a few prompts to consider:

Section 9.9 of Bryant and O'Hallaron contains helpful reading on heap implementation strategies and presents partial code for an allocator case study. This allocator uses boundary tags and an implicit free list and demonstrates some worthwhile motifs such as abstracting the pointer arithmetic/typecasts (although we strongly recommend using inline functions instead of macros) and strategically installing a few extra blocks to avoid making special cases. This design is a moderate performer, the code is missing some key pieces, and I know of at least one bug, so I would recommend it as code you can learn a lot from, but be wary of adopting from it without careful scrutiny of what you're getting.

A much simpler starting point is to consider the code given in allocator.c. This is a naive but largely functional implementation of the allocator functions. There is such a small amount of code and trivially simple approach that you can easily read and thoroughly understand it line-by-line. It's a very long way from being a efficient implementation, but you could apply an incremental approach to evolve it forward, applying improvements one step at a time. Analyze what about its approach is most problematic, figure out a way to make it better (how about re-using freed memory or not requiring a whole page for a 2-byte request?), implement the improvement, measure the gain, and repeat.

The design process can be fun and creative but often requires some iteration until your design solidifies. Be sure to get an early start so that you have sufficient time to brainstorm alternatives, weigh trade-offs, make decisions, experiment, react and rework as necessary to land a successful final result. Also be sure to keep up with regular Mercurial commits to allow for easy trial-and-error exploration. If you commit before trying an experiment, you will have a known good snapshot to revert to should it not work out.

Design choices and performance opportunities

The sample scripts reflect a range of scenarios for how the allocator might be used. Reviewing the relative performance on different scripts provides guidance about in which situations your allocator operates efficiently and those it does not, allowing you to identify opportunities for improvement. For example, if you have bottlenecks in realloc, you will see it reflected in poorer performance on those scripts that make heavy use of realloc as opposed to the other scripts that have little or no use. Sometimes improving the results for one particular script comes at the expense of degrading others, so you'll have to weigh the results on balance. The per-script results tell you where you have strengths/weaknesses, but your allocator will be evaluated on aggregate performance, not any one script in isolation.

Some suggestions to help get you thinking about designing for performance:

Measuring and optimizing

Once you have a solid design that is correctly implemented, you can then work to reduce the constant factors by applying code optimization techniques. The first step in any optimization effort is measuring. The alloctest results give a high-level view of your allocator performance as a whole, but you'll need more fine-grained measurements to make effective optimizations in the code itself. Run alloctest under callgrind to profile the code and obtain instruction counts and cache hit/miss counts. Use the report to identify code passages that are heavily-traveled and/or particularly cache-unfriendly. These critical passages are where to focus your attention for maximum gain. Even just a slight decrease in the instruction count or cache miss rate for a high-traffic area can pay off handsomely in absolute results.

A few notes on optimization:

Script files for alloctest

The alloctest harness runs on script files. A script file contains a sequence of allocator requests in a simple text-based format. The three request types are a (allocation), r(reallocation), and f (free). Each allocation result is assigned an id-number that can be referred to in a subsequent realloc or free.

a id-number size
r id-number size
f id-number

A script file containing:

a 0 24
a 1 100
f 0
r 1 300
f 1

is translated by alloctest into these calls to your allocator:

void *ptr0 = mymalloc(24);
void *ptr1 = mymalloc(100);
free(ptr0);
ptr1 = myrealloc(ptr1, 300);
free(ptr1);

You can open any script file in your text editor. A script can begin with a comment that summarizes the scenario presented by this script. You can also peruse the entire sequence to see the pattern of requests in more detail. You can make your own simple scripts using a text editor and generate larger, complex scripts programmatically.

The importance of validate_heap

You are to implement validate_heap to scrutinize your heap data structures and squawk loudly should any problem be found. When checking correctness, the alloctest harness makes a call to validate_heap after every allocator request. If validate_heap is implemented to identify when the heap goes from valid to invalid, the monitoring from alloctest can pinpoint the exact moment the heap got out of whack. Instead of trying to postmortem a corrupted heap with wild guessing about how and when it got that way, you can instead jump right in and trace the critical call in context to see how it went astray. This debugging aid is priceless, but depends on you having usefully implemented validate_heap!

Review the code in alloctest.c to see its external checks of heap correctness (e.g. one check verifies addresses are aligned and within heap, another verifies allocated blocks don't overlap). You don't need to repeat these checks, instead validate_heap will augment them by reviewing the internal consistency of the heap. The implementation of validate_heap should be written alongside your allocation functions. As your heap data structures become more complex, validate_heap becomes more sophisticated to match them.

For example, validate_heap might walk the entire heap and verify such things as:

Do not have validate_heap spew a dump of the heap data structures for you to wade through in a attempt to manually validate the contents. Instead have validate_heap traverse the structures, directly look for trouble, and report only on failures (e.g an in-use block is found on the freelist). No need for running commentary about things that are working fine. In this way, validate_heap will directly bring your attention to exactly those issues that require your intervention and otherwise keep a silent vigilance.

The more comprehensive and thorough checks done by validate_heap, the more useful it will be to you. A good validate_heap will also be slow -- do not be at all concerned this, this function is a development/debugging aid, there is no desire for it to be efficient and it will not be invoked during any performance measurements.

It can be very difficult to try to debug an allocator without decent validation, so do yourself a favor and invest in yours from the get-go. If you somehow manage to get your allocator working without one and then throw together a lame version on your way to submit to appease us in grading, you will have missed the entire point.

Code quality

Our grading priorities heretofore have put little emphasis on efficiency. For all assignments heretofore, we designated only a few points for meeting the "reasonable efficiency" benchmark and we were quick to ding for hacky algorithms, code duplication, lack of decomposition, unnecessary special cases, and other choices that make for ugly code, even if they offered some efficiency advantage. The priorities change for this assignment where efficiency is the focus. This doesn't mean you should set out to write something ghastly, but it does shift the trade-offs. Obviously the best outcome is streamlined code that is also clean and high-quality, but when the two are in opposition, compromise is inevitable. When making the choice to sacrifice a bit on quality in the name of performance, only do so if the performance gain you're getting is measurably important and there is no elegant way to get the same boost.

Program facts

Some facts that might help you gauge what to expect.

Frequently asked questions about assign7

Is it required to work with a partner?

No, partnering is completely optional. In past quarters, about third to half the class has chosen to work individually and the remainder pair up. Working in a team has advantages (two heads are better than one) and disadvantages (scheduling, coordinating, arguing). The average score has been roughly the same for partnered versus individual submissions, although there was a smaller variance in the partnered group (i.e. solo submissions have more outcomes at the extreme edges). Read our advice for successful partnership if considering the partnered route.

Are we allowed to adopt the allocator code from the textbook? Are we allowed to adopt code from elsewhere?

We plan for you to invest your hard-earned CS107 superpowers in your own creation, but if you do need to borrow from the text to get started, we have a specific carve-out that allows you to incorporate Ch 9.9 code from text with appropriate citation. All other code is entirely off-limits -- you must not review or adopt code for your allocator from any other resource.

Must my allocator <insert-technique-here>? ("maintain an explicit free list", "use a pre-node header", "implement coalescing", ...)

No. You get to make all design choices for yourself!

What does it mean that every address must be aligned to a multiple of 8?

Each payload must be placed such that its address has last hex digit of 0 or 8. Addresses 0xa5b8 and 0xa5b0 are aligned; 0xa5b4 or 0xa5b1 are not. The alignment requirement applies to the base address of the payload; whatever headers/footers you have placed around it are your own business. The textbook indicates the alignment multiple is 16 for 64-bit code, but we are aligning to 8, the size of largest primitive type. You can choose to align to 16 (or any larger multiple of 8) for convenience if you like.

What is the largest payload size the allocator must handle?

The theoretical maximum request for a 64-bit address space is beyond enormous but in practice, the max size will be constrained by other in-use segments. Your design can plan that payload size tops out at INT_MAX bytes. A request larger than max should be returned unserviced (i.e. return NULL).

Is there a minimum payload size?

No. For some allocator designs, it will be convenient to round up small requests to a minimum payload of 8 or 16 bytes, but there is no requirement to do so.

My validate_heap routine is very slow. Should I be concerned?

No. In order to do its job well, validate_heap should be thorough and comprehensive, which requires time to execute. This function is a development/debugging aid, there is no desire for it to be efficient and it will not be invoked during any performance measurements.

I'm getting an ALLOCATOR_ERROR from alloctest. What does this mean?

When testing correctness, alloctest has some simple checks to look for heap trouble. It checks that each malloc/realloc call returns an address within the heap segment that is correctly aligned and the new block doesn't overlap any other currently in-use block. It writes a identifying byte to the payload an allocated block. When asked to realloc/free a block, it verifies the payload contents have remained intact. It also calls your validate_heap after each request. If any of these checks turn up trouble, it reports an ALLOCATOR_ERROR to draw attention to the problem. Review the code in alloctest.c to understand the symptom for the particular error being reported to you.

Some of my heap housekeeping seems to get randomly overwritten with a repeated pattern, such as 0x2a2a2a2a. What gives?

After allocating a block, alloctest copies a pattern over the entire payload, e.g. repeats 0x2a or 0x08 in every payload byte. It does not write before/after the block, only on the payload bytes. If this pattern appears where you think it should not, this means there are bytes that alloctest believes is in-use payload yet your allocator thinks those bytes store heap housekeeping. Perhaps an in-use block was incorrectly left on the free list, a block coalesce/split was bungled, a block is wrongly both in-use and free? Identifying when the corruption came about sounds like a job for validate_heap!

When I valgrind alloctest, it warns about "set address range perms: large range". What gives?

The default Valgrind tool (memtool) is designed to spy on the actions of the real malloc/free and will be paying no attention to what you are doing over in your own heap segment. Using memtool on alloctest is not only unhelpful, its complaints may needlessly worry you. Do not use Valgrind memtool on this assignment. You can run alloctest under Valgrind callgrind (the profiling tool) for help in finding performance bottlenecks.

My allocator gets a segmentation fault dereferencing an invalid address, but when I access that same address within gdb, it seems fine. Why?

gdb doesn't heed the protection bits on memory segments, so it can access read-protected locations without complaint, while accessing these same locations in a running program will cause a segmentation fault. If your allocator is faulting due to trying to access a read-protected location, it could be due to accessing part of the heap segment that was previously mapped but has since been marked unreadable. This symptom is highly suggestive that your myinit doesn't correctly re-set the state of the allocator back to a clean slate.

GDB seems confused about my code; control flow jumps around, variables are missing. What gives?

Are you attempting to debug on optimized code? The debugger will do its best, but the optimizing transformations make the assembly much less of a literal rendition of the C, so stepping may appear to jump through the code erratically, statements may have been elided, variables optimized out of existence so unknown to gdb, and so on. You'll have your best luck doing your debugging on code compiled with little or no optimization. Some bugs only show up when compiled with optimization, in which case you have no choice. In those cases, it may be more helpful to follow the asm than try to work from the C code.

My allocator works on all scripts except for <insert-some-case-here> (e.g. crashes on line 1428 of binky.script). Can someone please look at my code and find what's wrong or tell me how to fix?

Sounds like you have a bug! Don't look for someone else to wave a magic wand, you've got this! Put to work those fabulous debugging chops you have been building up all quarter. Get that program under your gdb microscope, devise experiments to run, gather data, draw pictures, disassemble, poke around in memory, and trace to find how and where your code is going astray. One particularly useful leverage point is the validate_help function (See section above on "The importance of validate_heap").

In office hours/forum/email, our focus will be on supporting you so that you can track down your own bug. Please do ask us how to best use tools, what strategies to consider, and advice about how to improve your debugging process, but asking us to diagnose your bug by description/symptom? Not so much. The staff will not look at your allocator code nor attempt to trace/review/debug it by inspection. We expect you to take ownership of your own code and drive your own debugging. You're ready!

My allocator has abysmal performance on the tiny scripts. Should I be concerned?

No. These scripts are much too small to produce meaningful performance results. They are intended only as a first step in early development and debugging. We will not test allocator performance on such small scripts and neither should you. Figure that a script needs more than 500 requests before it is even worth measuring.

My performance measurements vary from run to run on the same scripts, with no code changes. Is this normal?

Utilization is generally stable on the same inputs, but throughput varies with CPU/cache contention, so it is expected to see small ups and downs from run-to-run and even larger discrepancies when the system is under heavy load (use uptime to view activity/load). During grading, we will evaluate performance on an isolated quiescent host to reduce artifacts. All myth hosts are running identical software, but some have skimpier hardware (e.g. 4M L2 cache versus 6M, slower clock) which can affect results. You can view a host's specs in the file /proc/cpuinfo if you're curious. We will measure performance for all submissions on the same myth under the same conditions to ensure consistency in grading.

Am I guaranteed the same allocator performance on the grading scripts that I get for the sample scripts?

No. The mix of grading scripts will be similar, but not identical, to the published samples. Your allocator should be designed to perform well in a variety of situations, rather than be tailored to fit exactly one set of known inputs. It may help to play with creating some scripts to broaden the mix of workloads on which you have tested. It is typical that there are some scenarios that play into your best-case handling and others that don't, so ups and downs are expected, but far outliers can dominate the aggregate, so keep an eye on your extreme cases. This means both trying to bring your worst-case closer to the median and being wary of depending too much on your best case making up for others. If your performance is very consistent across a variety of scripts, you gain higher confidence that your results will be replicable across a wider variety of scenarios. Look for the opportunities that moderate between extremes. If your allocator has consistent performance across most/all sample scripts, then taking a few out and replacing with others should result in only minor change in overall performance, but if you have wider variability script-to-script, then your allocator performance may experience a more significant swing when we mix things up in grading. You can estimate your own worst-case outcome by calculating how your allocator would fare on a mix constructed from the published samples having removed your top few best performers and replacing with copies of your weakest performers.

What gcc flags are allowed?

Anything goes. Edit ALLOCATOR_EXTRA_CFLAGS in the Makefile to configure your desired options. The initial flags are -Og. While still in active development, this flag generates code that is more debugger-friendly, but once you're working on performance, experiment with more aggressive levels and options. We will compile your allocator using the setting from your submitted Makefile, so be sure to submit with the flags configured to work best for your code.