NOTE: this website is out of date. This is the course web site from a past quarter. If you are a current student taking the course, you should visit the current class web site instead. If the current website is not yet visible by going to cs107.stanford.edu, it may be accessible by visiting this link until the new page is mounted at this address. Please be advised that courses' policies change with each new quarter and instructor, and any information on this out-of-date page may not apply to you.
General Concepts
What is the difference between 1 << 32 and 1L << 32?
Without the L, the 1 is treated as a signed int, meaning that the shift by 32 would be undefined, as we are shifting by the bitwidth or greater. With the L suffix, we would get a long with a 1 shifted by 32 places.
Practice Final 1
Problem 2
Could we have the condition be count != 0 instead of count > 0?
Because count is a size_t, yes, the if test could have been count != 0 instead. This is because size_t’s are unsigned longs, so there's no chance it'll store a negative value.
Why does “0x400577 <+17>: shr %rsi” result in count/2? Why is there no second argument to shr?
When there is no second argument to shr, it's implied to be a 1. Since it’s right shifting by 1 (on an unsigned variable), it is effectively dividing by 2.
Where in the assembly code are we setting x to arr? How do we know that %r12 is x?
Here are the set of instructions leading up to the jmp associated with the for loop:
The declaration and initialization of x needs to be associated with one of the lines preceding that jmp, but because only the first argument passed to mystery is a void * (the others are size_t, int, and function pointer), that should tell you that either mov %rdi,%r13 or mov %rdi,%r12 is in place for the void *x = ________ line, and since both realize void *x = arr;, that would be my first guess as to what that line of C code should be.
As to why r12 is being used instead of r13? You'll notice that r13 looks to be a copy of whatever the first argument is, but r13 isn't itself changed every again. It's used in read-only capacity, whereas r12 is updated by the instruction at offset +62. Something that's read-only could be a variable, but something that's updated almost certainly needs to be a variable. And because the updated values of r12 are repeatedly passed at the first argument of the comparison function, that's even more evidence suggesting r12 is being used on behalf of x.
Problem 3
How does the print statement in line 19 ultimately go on to print the password?
Note that get_realpw() doesn't possess a return value, but instead stores the result in the buffer that was passed in, i.e. the 16-byte realpw. But the real issue is that strncpy() will copy only 16 bytes from the user's input password (userpw ) into the userpwcopy buffer, with no guarantee that a null-terminator \0 will be copied over. However, to be a proper, null-terminated string, userpwcopy must have a null-terminator somewhere in its 16 bytes; otherwise, %s in the printf will keep going in search of a null-terminator, \0 printing out more characters than it intends--even beyond the bounds of the array realpw.
Furthermore, as realpw and userpwcopy are both buffers / arrays in the function authenticate() , their memory regions will both reside in that function's local stack memory: to draw out a stack / memory diagram:
-- bottom of authenticate's stack memory
[16 bytes of realpw, characters of real password written going up]
...
[16 bytes of userpwcopy, characters of user attempt going up]
%rsp -- top of authenticate's stack memory (growing downwards)
Hence, by printing userpwcopy, without a null-terminator, %s will keep printing characters / bytes going upward in the memory, going beyond the userpwcopy buffer, and infiltrating into the buffer space of realpw, resulting in the real password being printed as well.
Problem 4
In Part (A), for get_size, instead of the casting to an int *, could we do (headerT *)curr->payloadsz & mask;?
Yes this would work.
Assume curr points to address x. Casting curr as an int * then dereferencing it means we go to address x and read 4 bytes. Casting curr as a headerT * means we go to address x. When we access the payloadsz field, we are reading it as an int so we read 4 bytes starting at address x.
Same result either way. However, get_size might receive a pointer to a footer, so while this will work, curr may be actually pointing to a headerT.
In the myfree function, why do we subtract one from the header pointer? does this free function get passed in a pointer to the payload?
Yes, that's precisely the case with all free functions... the client gets void *'s from malloc and passes void *'s to free, and those pointers are assumed to be base addresses of payloads.
Since ptr is a pointer to the payload, we want to move sizeof(headerT) bytes back to reach the header, which is achieved by the -1.
Part C: If a block was indeed allocated (second to least significant bit turned on), is_reallocated would return 0x2 (the & mask operation would turn off all bits except the second the least significant bit), right? If this is the case, since is_reallocated is meant to return a bool, are we assuming that 0 is false and any other integer is true?
Yes! 0 evaluates to false, anything else evaluates to true.
In myrealloc, why does it & with 0x2, not 0x3?
You could check the allocation bit as well, but you're permitted to assume a valid heap data structure and perfect client use, so that if bit 1 is set, then bit 0 better be as well. You would never see an unallocated block passed to realloc if the client is using heap memory properly.
Practice Final 2
Problem 1
Why does this print "Tessi"?
The char* name wasn't actually changed by the input += pos; operation inside the buggy_substring function, since input is a local variable. However, this does change the value of input inside the function, so when input[len] = '\0'; is run, that will put a null terminator in the pos + len position of the input string, and this change persists after the function returns.
Why couldn't we have just done substring(&name, 3, 2)?
Due to name being a stack array, &name evaluates to the same value as name. However, with ptr being a pointer, &ptr actually evaluates to the address at which ptr lives. Using ptr, we can actually get a char **.
Problem 2
How do we calculate bytes_to_move? What does (*p_nelems * width) mean?
bytes to move = total # of bytes - # of bytes before min
*pnelems*width is the total # of bytes. To calculate the number of bytes before min, we can subtract the address of the base of the array from the address of min. subtracting the addresses leaves us with the desired raw number of bytes.
In shortest(), why do we pass &min to extract_min?
Since extract_min is a generic function, we need to work with pointers to the elements we are storing, rather than the elements themselves. Even if the element type we are storing is a pointer, we need to pass in a pointer to that.
In this case, we need to pass the address of our variable min, so that extract_min can directly write to the bytes of min.
Problem 3
Why don’t we have local += 3 in the code? Wouldn't lea 0x3(%rdx) effectively add 3 to that variable?
Since local is signed, this is the "fix up" necessary before the right shift is applied, since the compiler is not using the division instruction.
Because it's possible that the value of local is negative. lea adds 3 to local and puts it in %eax, but then tests to see whether local is nonnegative or negative with that test instruction. If it's positive, it moves the nonnegative value of local into eax; otherwise the value of local + 3 already in eax is retained. Regardless of what ended up in eax—be it (the nonnegative) local or (the negative) local + three more, the right shift by 2 bits divide the actual value of local by 4 and leave it in %eax to serve as the return value.
Would having written return (local >= 0 ? local : local + 3) >> 2; be acceptable as well?
Yes! That’s actually equivalent.
Problem 5
For part B, I'm having difficulty understanding the solution. Is it possible to explain this with a diagram?
Here's a (crudely drawn) stack diagram. After concat returns, we return to main, and everything below the blue rsp pointer is no longer considered part of the stack. The next line in main is a function call to printf, which will store its own information on the stack, starting from the blue pointer and writing downwards.
In the first case, where result was 10 bytes, we see that the result is likely to be written by printf's stack frame. in the second case, because result is so far down, printf is unlikely to be writing 4096 bytes of data onto the stack. so the value of result far down on the stack remains unchanged.
(Diagram pending)

For C, I understand how the assembly instructions with the printf stack frame are garbling the contents. But, I don't understand why the code prints that garbage?
Note that the return value is a pointer to a local variable. It's true that the local variable was perfectly sized to accommodate the full string, but it still returns a pointer to stack space that goes away once concat returns. The characters in that buffer are still there even though concat returns, but as soon as printf gets called, its own stack frame overlays that same memory and updated it, clobbering the LelandStanfordJuniorUniversity characters with whatever it is that printf writes to its own local variables.
Problem 6
How does prev = *(void **)to_remove manage to remove to_remove from the free list?
The general idea is that we get a pointer to the block prior to the one to remove, and set it to point to the block after the one we want to remove - so we rewire the list to "skip over" the block we wish to remove. Here’s an example you might find useful: consider the case where the block to remove is at the front of the free list. prev would initially be NULL, we enter the loop and set prev = &freelist (so it's pointing to the head pointer), and immediately break because freelist == to_remove. Then, we do *prev = *(void **)to_remove which sets freelist to now be to_remove's next pointer.
Practice Final 3
What is endian?
Little-endian means the bytes of a multi-byte figure are stored in reverse order, e.g., an int variable called x storing the number 1 would lay down 0x1 0x0 0x0 and 0x0 in that order in memory. A bid endian system would store them in forward order: 0x0 0x0 0x0 0x1. I mentioned endianness is passing a few times, but it was never formally introduced, so we wouldn't have expected you know know anything about it when answering a question like this.
Practice Final 4
Problem 1
When linking two nodes together, why can we do assignment with ‘=’ and not use strcpy/memcpy? How do I know when I can use memcpy?
In general memcpy will work in place of using assignment if we are copying one value to another location, i.e. we can write an expression like
x = y
as this - we pass in &x and &y because memcpy takes in pointers to the src and dest values:
memcpy(&x, &y, sizeof(x))
These are functionally equivalent - though we tend to prefer assignment as it can be less prone to error as it relies on variable types vs. working with raw bytes.
What is the intuition for building the linked list "backwards" from the n-1th item in the array to the front?
The idea is that before we can finish the ith node, we need to have built the (i + 1)th node since its address is stored in node i. This is why when solving this iteratively, we start with n - 1.
On the line *(char **)(node + strlen(strings[i]) + 1) = head How can we cast node + strlen(strings[i]) + 1 to (char **) when (node + strlen(strings[i]) + 1) is a char *
Remember that everything is just 1s and 0s at the end of the day. What we're doing here is making a node that has space for the string, a null terminator, and then a pointer to head (the start of the previous node)
So let's say we're in the middle of this list. Our strings[i] is "hello" and our head is 0x4000.
First, we malloc 14 bytes.
* 5 bytes for "hello"
* 1 byte for the null-terminator
* 8 bytes for the head pointer 0x4000
Then we strcpy in "hello\0" which fills 6 of the bytes.
Lastly, we need to copy in head. We want that to start at the 7th byte (after the null terminator).
* node is a char * right now, which serves us, as we want to do pointer arithmetic that lets us work byte-by-byte (+1 adds 1 byte, instead of +1 adding 8 bytes in the case of a char **)
* When we do node + strlen(strings[i]) + 1, that gives us a char * pointing to the first byte after the null terminator (node + 5 + 1 in our case). This is the beginning of the 8 bytes that will hold our head pointer.
* So now that we have that particular address (which is, again, just a number), we don't need it to be considered a char * anymore--because we want to write a char * into it. So we say, okay, now we want to consider this address as pointing to a char *. Which is to say, this is a char **.
* Now we've got a char ** pointing to exactly where we want to write head! So we dereference and set it equal to head (and the type matches,char * = char *).
Something to note here: an address is just a number—a pointer to somewhere in memory. It doesn't remember what type it's supposed to be pointing to. So if we treat a pointer as a char * and then later treat it as a char **, it won't complain!
Problem 2
Could you re-explain the solution to part (b)? Why doesn’t the optimized version need to push %r12?
By convention, some registers are caller owned and others are callee owned. For the former, this means that when a function is going to use any register $r, it must restore the value that was in $r before it returns so that it doesn't affect the parent function that called it (e.g. imagine if the caller function was using $r to store the iteration of a for loop - if this wasn't restored it would be an issue). For callee owned registers , the responsibility is on the parent function to save the value of $r before calling a helper function that might overwrite that register's value.
To actually save the values in registers, functions just push the value onto their stack. Then before they exit, the pop from their stack and restore the value onto the register.
Because the code does not use the register $r12, there is no reason for it to have to save its value. This is why the optimized version does not save $r12 by pushing its value onto its stack
Could you re-explain the solution to part (d)? Why doesn’t the optimized version need to push %r12? What does callq do that jmpq doesn’t, and why can the optimized version use jmpq instead of callq?
%rsp contains the address of the instruction address immediately following the call to ella.
Remember that the callq instruction pushes the address of the next onto the stack so that just as ella starts executing, that saved return address occupies the lowest 8 bytes of the stack, and %rsp points to the base address of it. Just before retq is executed, %rsp has been restored to point to the exact same location—to the saved return address. The retq instruction consumes the return address, increments rsp by 8 bytes, and then overwrites rip with the saved return address it just popped off.
Practice Final 5
Problem 2
What’s happening with the i and cinnamon variables? Could you provide an alternative explanation?
So, %rbx = i and 0x18(%rsp) = cinnamon. It's tricky, since i starts out as cinnamon in the for-loop. Here's how we figured out the difference!
On lines <+14> and <+17>, we set %rbx = cardamon[0] - mustard. Then, on line <+20>, we save that %rbx value into the stack at 0x18(%rsp). So initially, %rbx and 0x18(%rsp) are the same (hence, i = cinnamon at the beginning of the for-loop).
So now we need to figure out which is which.
We know the code states that cinnamon = allspice(...). If we look what happens with the %rax value set by calling allspice on line <+87>, we see that we store that %rax value into the stack at 0x18(%rsp) (which happens on line <+92>). So we see here that the return value of allspice is saved into 0x18(%rsp), and we know from the C code that we are setting cinnamon = allspice(...). So this is one clue that cinnamon is the local variable stored at 0x18(%rsp)!
So then what about i? Well, here are some tips for recognizing that %rbx represents i.
In the conditional representing whether we stay in the loop or not (lines <+33> and <+36>), we are using %rbx in our comparison. Which is interesting, since we don't really change %rbx much.
In fact, the only time we change %rbx is at the very end of our loop, at line <+97>.
So this pattern—%rbx is only updated once, at the very end of the loop, and it is only referenced at the beginning of the loop when we see if we're still using the loop—should make us think that perhaps %rbx is used to control the loop!
Lastly, one last detail for telling the two apart.
When we call printf , we pass in &cinnamon as our third parameter.
The only way to get the address of a local variable is if the local variable is stored in memory somewhere (e.g. the stack). We can't get the address of a register.
So if %rbx = cinnamon, then we have no way to access &cinnamon! However, since 0x18(%rsp) = cinnamon , we can access &cinnamon just by looking at %rsp + 0x18.
Problem 3
In (b), why would *start == info.start guarantee that it's a function pointer?
From the solution
* The stack frames of all functions are, well, stacked at higher addresses, and each stack frame is separated from the one below by a return address.
The function is basically going through the stack as if it is an array void *s and checking if the current address is a valid return address (in other words does it fall within a known function’s memory range?). The reason we use *start > info.start to avoid function pointers is because it wouldn't make sense for the return address to be the start of a function. To see this, recall that a return address in assembly is where we should continue running from after we finish the current function. In other words, it takes us back to where we were in the caller function after we finish running the callee function. I'm going to borrow some assembly from binary bomb below
It’s important to note that it wouldn't make sense for a callee function to return to address 000000000040229a, since we would need to have at least one instruction where we actually call it using callq. This is why we say that if *start is 40229a, it is probably being used to store a function pointer, not a return address. For backtrace, we just care about the return addresses since we want to know what functions were called up to this point
Problem 4
Why do we have to use 1L here?
Using 1L ensures that we use the 64-bit value, which is necessary for correctly manipulating the 64th bit in a size_t value. If you use just 1, you are using a 32-bit value, and it might not work correctly for 64-bit bitwise operations. So by using you ensure 1L that your mask operates on the correct bit width and avoids potential issues with incorrect masking or shifting.
Why do HEAP_SIZE and the size_t size variable need to be divided by sizeof(size_t)? Similarly, why do we have size/sizeof(size_t)?
Note that active_start is of type size_t *, but that HEAD_SIZE is in bytes. We could cast active_start to be a (char *) and add HEAD_SIZE to it to compute the address of the end, but the solution elects to explicityly scale HEAD_SIZE down by a factor of sizof(size_t) knowing that pointer arithmetic against a size_t * will automatically scale it back up again. Remember that the offsets in pointer arithmetic are automatically multiplied by the size of the figure being addressed.
*(size_t *)(*payload_curr) is different than *payload_curr! *payload_curr evaluates to the size_t residing at the address stores in payload_curr. That size_t is then interpreted as the address of another size_t with that size_t * cast, and then dereferenced. As written, the solution involves one more dereference that what you suggested as an alternative.
Can you clarify to me how the solution for question 4d achieves its purpose? I don't understand how it is rerouting the pointers in the inactive heap to point to the right thing, especially if the free spaces are put somewhere else in the inactive heap, so wouldn't there need to be an offset that counts how much free space was traversed in the active heap, and then apply that as the offset in the inactive heap?
The first two lines compute the boundaries of the inactive heap, and the while loop test ensures that we only pay attention to allocated nodes in the inactive heap and therefore need to be replicated in the active one. My guess is that you more or less figured this part out and that you're interested in the meat of the while loop body.
Part of what's omitted above is presented below, which looks at each word's worth of payload within an allocated node:
while (payload_curr < payload_end) {
if (within_active((size_t *)(*payload_curr))) {
// still omitted for the moment
}
payload_curr++;
}
Because payload_curr is a size_t *, levying a ++ against it advances the number it stores by 8 bytes. *payload_curr evaluates to the size_t that resides at the address stored in payload_curr, and by casting it to a size_t *, we're entertaining the possibility that it's a valid heap address. (We rely on the within_active to return true if and only if that's true.) It's true that the eight bytes there are incidentally a value that just looks like a heap address, but the problem says we should assume that anything that looks to be an address within the heap actually is one.
The line is bold is the crucial one. If the address is actually within the heap, we assume the eight bytes at that address comprise the offset placed there in part c of the problem. That offset is in bytes, so we need to divide that offset by sizeof(size_t) in the next line knowing the pointer arithmetic will automatically multiply it back by sizeof(size_t). The second line:
Why do we have to get the memory address of the pointers for (b), but not for (a)?
The swap() function's first two arguments are the addresses of what needs to be swapped.
* For part (a), we want to swap 1, 2, 3, with 8, 9, 10, so we pass in the address of the 1 (ptr1 already points here) and the address of the 8 (ptr2 points to the 6 to start, so we need to add 2 to point to the 8).
* For part (b), the easiest way to do this is just to swap the values of the two pointers themselves. So we pass in the addresses of ptr1 and ptr2.
If you need to use the address of a value, you'd use & (address-of operator) to obtain this address. If you have an address, or pointer storing an address, and you want the value located at that address, you'd use the * (dereference or indirection operator) to get this value.
Assembly #1
In the function, I'm not sure what imul &0x31, %esi is doing?
That's the compiler being smart about the part (c) line:
can just be substituted with a constant 49, i.e. 0x31.
Now, you might wonder how we know what eliza[3] is not equal to 0x31*6*burr[0], since esi seems to be the register storing eliza[3]. It's a little tricky, but basically, esi is immediately reused to store the result of the 4-element multiplication. If it was the case that eliza[3] == 0x31 * 6 * burr[0], then you'd need a second multiplication by 0x31 to multiply by eliza[0] * eliza[1] for line (c), but there is none.
This document and its content are copyright Stanford University, 2025. All rights reserved. Any redistribution, reproduction, transmission, or storage of part or all of the contents in any form is prohibited without the authors' expressed written permission. NOTICE RE UPLOADING TO WEBSITES: This content is protected and may not be shared, uploaded, or distributed. (without expressed written permission).