Final Exam Review - FAQ

General Concepts

Crash Recovery

  • How does the block cache impact durability?
You should think of the block cache as improving performance at the cost of durability. The best approach for durability would be to just always read and write directly from the disk, but this would be very slow. That being said, block cache with synchronous writes is better for durability than block cache with delayed writes because there is less delay between changing the data and writing the new data back to disk. However, synchronous writes will be slower than delayed writes.


  • In write-ahead logging, when do we decide when to write the parts of the log to disk?
When adding information to the write-ahead log, both the data for the log and the corresponding actual changes will be in the block cache. Thus, we can create a dependency between the actual metadata change and the corresponding log, where before the blocks corresponding to a metadata change are flushed to disk, all blocks corresponding to that metadata change’s commit are flushed to disk first. This ensures that, if the computer crashes at any point when trying to flush these blocks out of the block cache, we can maintain consistency because we can now guarantee that the operations making up a commit are either fully played through or not played through at all.


Multiprocessing

  • Does waitpid(-1, NULL, 0) return -1 if run in the child? If so, why?
Yes - it returns -1 if there are no children to wait on (which would be the case for the child, assuming it didn't spawn any processes itself).


Multithreading & Synchronization

  • Can a thread be in the queue for 2 different mutexes at once?
No, this shouldn't be possible. Imagine a case where a thread goes to lock two locks back-to-back:
lock(l1)
lock(l2)
The thread will block at the first line until it acquires l1 before proceeding to try to lock l2


  • Does scheduling and dispatching only ever refer to threads? Or could it refer to processes too?
Threads are the unit of execution the OS considers to run, rather than processes - in other words, the OS is switching between threads, and for scheduling we are choosing what thread to run next, rather than what process to run next. A process contains 1+ threads, and can be thought of as a “thread container”. Though by switching from a thread in one process to a thread in another, we are also “switching processes” in a sense, but the OS is choosing what thread to run next, and switching between threads. (as a sidenote, we saw briefly in the modern technologies lecture that if we have something like gang scheduling, what process a thread is in could factor in to when the OS schedules a thread to run, but the OS is still ultimately choosing threads to run.)


  • When to use unique_lock (as opposed to manual locking/unlocking)?
The unique_lock locks the mutex automatically when created and unlocks when it goes out of scope. So if you have a lock that you're just going to unlock at the end of the function, it makes sense to use unique_lock. In the specific case where you try to unlock a mutex in the middle of a function, then it doesn't make sense to use unique_lock.


  • Does context-switching refer to switching processes? I thought switching threads wouldn't change the virtual address space.
Context-switching is always between threads - however, it's possible for us to switch from running a thread in one process to running a thread in a different process, with a different virtual address space.


Scheduling

  • In a round-robin scheduler, if only one thread remains and its remaining execution time is an exact multiple of the time slice, will it continue running without interruption or will it still be preempted at each time slice boundary?

    For example, suppose the time slice is 2 ms and we have threads A (2 ms), B (6 ms), C (2 ms). Would the execution order be (1) A->B->C->B->B, or (2) A->B->C->B?
The behavior is that timer interrupts will still occur as normal even if there is only one thread to run. That thread will just get the CPU again.

The way you write the answer is somewhat arbitrary. We'll try our best to avoid ambiguity on the exam, but if you find an ambiguous situation, it's always a good idea to ask and/or state your assumptions on the page!


Full Practice Final Exam

Problem 1

  • Why are there 5 printed lines (instead of 4)?
The first cout is printed only by the parent. The second cout is printed by both parent and child. The final cout is printed by both parent and child. The break ends the while loop in the child, but the child still executes the final cout.


  • Why doesn't counters[getpid()] = 5 change the dictionary value in the parent process?
When we call fork() , now we have a parent process and a child process. Each process has its own copy of memory. Modifications made by the child process only affects the child's memory and doesn't affect the parent's memory. So when the child process does counters[getpid()] = 5 , it only changes the child's copy of the counters map, not the parent's.


  • Doesn’t the child process always have PID 0?
The PID of the child isn't 0, but rather 0 is a sentinel value returned by fork to indicate that we are the child.


  • If the waitpid loop were placed after the cout statement instead, is it true that we wouldn't be able to guarantee which process prints last—the parent or the child?
Yes, if the waitpid loop is after the cout, you will not be able to guarantee the printing order.


  • Why is there one guaranteed output?
When the child enters the “while true” loop, the call to waitpid will immediately return -1, since the child has no children. This will cause the child to break and print out counters. In the parent, the parent must wait on waitpid until the child finishes, in which case waitpid will return 111.


Problem 2

  • Do we need to ensure FIFO order?
This would depend if condition variables are FIFO or not, as it would depend who wakes up first. However, we don't need to explicitly ensure FIFO in our solution..


  • How do I interpret the half of the total weight limit rule (second bulleted requirement)?
The weight limit rule gives each direction half of the weight limit by default. This means that, even if the total weight limit is not exceeded, a car cannot simply cross the bridge if its direction has already exceeded half of the total capacity, unless the other direction is not "using its other half."


  • I used arrays to store states (e.g. int crossing_weights[2];). Is there any issue with solving it this way?
This is fine, as long as the indexing in the array is correct!


Questions regarding Solution

  • Could we have replaced (current_load_0 + weight + current_load_1 waiting_load_1 > capacity) with (current_load_1 + waiting_load_1 >= capacity/2)?
We need to be sure that there's enough available capacity for the new car. If we just did (current_load_1 + waiting_load_1 >= capacity/2), then inside the while loop, we'd know that direction 1 is underutilized and that there's spare capacity for direction 0, but we wouldn't know that there's enough capacity for THIS car to cross.


  • Why can’t we just keep track of a weight_left variable, initially set to weight_limit, and each car checks if their weight is less than weight_left before proceeding?
There is a possibility that even if weight_left - weight > 0 , the car needs to wait, depending on how much weight is waiting on the other side; hence, you need to keep track of cars waiting on both sides.


Problem 3

  • Why is it not allowed to have one writer and one reader?
Having one reader and one writer can lead to a race condition where the reading thread reads some location in memory while the writing thread updates that memory location at the same time. The goal of this lock is to protect against this by forcing either only readers or only one writer. Since the only way for reader locks to be released is for a reader to call release, a writer needs to wait for there to be no readers in order to gain the writer lock.


  • [3a] Does the alternative implementation using .lock() and .unlock() fail because it unlocks too early (at Line 5)? In contrast, does the original implementation using unique_lock ensure that the lock remains held until all lines have executed by going out of scope at the end of the function?
Yes to both of those questions. An easier way to think about code with unique locks is by replacing each unique lock created as a .lock() and by placing the corresponding .unlock() at the end of the function. That way, creating these two unlocks is essentially like locking and unlocking two locks.


  • Why is it fine to unlock our stateLock before our updates to read in release()? Wouldn’t this cause a race condition?
The reason you don't need to is because of what state the lock must be in when release is called. First, note that writing threads that call release return at the end of the if statement, so they don't enter the section without stateLock. If you're a reading thread and call release, then you know that writeState must either be Pending or Ready.

Now, the only way that a second thread can potentially change writeState and cause a race condition at this point is if a thread finishes acquireWriter - but for that to happen, it needs to pass the first while loop and then see that numReaders == 0. Going back to the original, reading thread calling release, however, numReaders must be positive until the first thread finishes release() since the release function only waits until the very end to decrement numReaders. Therefore, we see that it is impossible for the second thread calling acquireWriter to update writeState until the first thread returns from release(), meaning that there is no race condition.


  • [3d] I’m confused by the “better implementation idea” in the solution.
For this “better implementation idea,” we'd essentially be replacing the numReaders instance variable we currently have with a set. That way, instead of incrementing or decrementing numReaders, we'd add/remove the current thread pointer to the set. Then, if we were previously a reader and wanted to upgrade, we would wait until the only remaining element in the set is the thread that's trying to upgrade.


  • [3d] The solution states that one possible advantage is "using a thread’s support for priorities, you can give threads that are trying to upgrade higher priority, so they get the processor before lower priority threads do." Why isn’t this possible to do with the current implementation?
We'd add some kind of way to differentiate threads acquiring the lock as a writer between 1) threads that are already readers and 2) threads that aren't. We need this differentiation to prioritize the upgrading threads (with upgrading being when you're already a reader and becoming a writer, so case 1)).


Problem 4

  • How would videos automatically lead to larger file sizes needed?
Videos require larger file sizes because a video is essentially many back-to-back images over time.


  • How would someone approach this type of question in terms of thinking about access patterns?
Access patterns refer to how we would generally interact with data in a file system. In this case, the file system is being used for a video streaming site. When a video is uploaded, we write to the disk, and when someone watches it, we read the data from the disk. Since we would expect a video to be watched multiple times, reads will be more common than writes. Furthermore, videos are larger files in general when compared to say, text files so we would also expect large file sizes. Lastly, a video is usually watched/uploaded sequentially, so we would expect the reads and writes to generally be sequential.


Problem 5

  • [5b] What does “working set” in the solution refer to?
The working set is all pages actively being used by a thread.


Problem 6

  • [6a] What does size refer to in “virtual addresses must be same size as physical addresses” / What is the “width of the page table entries”?
“Size” / “width of the page table entries” means the number of bits that make up the address. For instance, the size of the address 0x12345678 is 32 bits (it's helpful to keep in mind that each hex character is 4 bits and two hex characters make a byte).


CS110 Practice Midterm 1

Problem 1

  • [1a] Why is approach 1 more cache-friendly?
Approach 1 is much denser and is stored in contiguous memory, so if you're using some cache and storing single blocks at a time, you can store the allocated status of (512 * 8) = 4096 blocks, whereas the 2nd approach only stores one index of a free block per block in the cache.


  • [1a] Why does the free list approach have zero memory footprint/overhead?
Approach 2 proposes storing 4 bytes in each unused block. It has “zero footprint'” in the sense that, when we do eventually allocate the block, we can just overwrite those 4 bytes that previously stored the next freed block. On the other hand, a bitmap has to be a fixed size and take up space separate from any file data.


CS110 Practice Midterm 2

(No frequently asked questions for this exam!)

CS111 Spring 2021 First Midterm

Problem 1

  • [1d] Is the solution implying that two processes can share code if their code segments overlap?
Yes, however, this isn't a practical approach as the answer also mentions, but we *are* saying that hypothetically it can happen if the code segment overlaps. In that case, both will have their own virtual memory but will be sharing the same physical memory for the code segment (We can imagine their virtual spaces being mapped in a way that the code segment overlaps in physical memory).


Problem 4

  • Could another approach be to create a mutex such that only one function is modifying x at a time?
No – ensuring that only one thread can modify x at a time doesn't guarantee that t1 modifies x before t2. The key point is that enforcing the ordering in which threads modify x requires more than just ensuring that only one thread can modify it at a time.


  • Could another approach be to place the while (!mod_done) { t1_modified_x.wait() } loop into the thread t2 function right before the computation?
Yes - since int x is always passed by reference, a change in the value of x in thread 1 should also be reflected in both the main thread and thread 2, so you could create thread 2, update the function to also take mod_done by reference, then have thread 2 wait until the computation is finished.


  • Could we technically have just called t1.join() before spawning t2 in order to ensure a deterministic ordering of execution?
While this would work, this would mean that we would not benefit from the parallelism of having the threads run at the same time, since the threads would be running one after the other.


  • Why do we place m.lock() in func1 right before mod_done = 1; instead of x += y;?
Since mod_done is both accessed in the while loop in the main thread and in func1, before we execute mod_done = 1 in func1, we want to make sure we have a lock on mod_done. We don't need to lock before x += y since x is only accessed in func1, since we haven't created thread 2 yet.


CS111 Spring 2021 Second Midterm

Problem 1

  • [1c] What is a hard link?
You're not responsible for knowing this, but a hard link is a mapping between a directory entry and an inode. We didn't discuss it, but it turns out that you can have multiple different directory entries referring to the same inode, which is what the solution is referring to when it says that "hard links would be difficult to implement."


CS140 Practice Final Exam

Problem 5

  • What are “access patterns”?
Access patterns mean the way that we access the data - for instance, reading from front to back, or accessing random bits throughout, etc. It's not about the size of the files, but rather the patterns/ways that we access those files.


CS110 Practice Final 2

(No frequently asked questions for this exam!)

CS110 Practice Final 3

Problem 4

  • [4f] In the solution, why is it “technically not” true that a thread calling smutex::lock before any others will get the lock on the smutex first?
Just because a thread calls lock first, it’s not necessarily true that it will be the first one to acquire the lock. Within lock(), there is an air gap between initializing the CV and acquiring the lock. There is a chance that another thread that calls lock AFTER you can both initialize the CV and acquire the CV within that air gap.


  • [4f] Why is it the case that the smutex::m implementation locks for a very, very short window as compared to arbitrary mutexes? And how does that affect FIFO ordering?
Imagine that there are two threads (A and B) and thread A calls smutex::lock() before thread B. Also, let's say a different thread (thread C) currently holds the mutex smutex::m. The "air gap" that affects the problem is what happens when we create the unique lock object, which is the same as locking smutex::m. When thread C releases smutex::m, there is technically no guarantee that thread A will get the mutex before thread B, even though it had tried to lock the mutex first.

However, in this specific problem, when a thread tries to lock smutex::m, it is very unlikely that another thread is holding that lock because the lock is only held while we enqueue the CV and check the condition, which does not take very long (remember that the call to wait releases the lock). Thus, the way smutex is implemented is likely (but not guaranteed) to be FIFO because there probably won't be any competition to get the lock.

This latter part is providing more elaboration to the answer, but to get credit, you only had to realize that C++ locks do not guarantee FIFO, so this system technically does not guarantee FIFO.


  • [4f] Is it also valid to provide a series of steps that would cause threads to not be given the lock in FIFO order? For instance, since the unique mutex is not the first line of the function, lock() could be called in one thread, the condition variable could be instantiated, but then it’s switched to another thread that calls lock() and obtains the unique mutex before the original thread?
Yes! There are a few key possible ideas that could cause this. The one the answer key is referring to is the "non-guarantee" that we are trying to target with the smutex, specifically that a regular mutex isn't obligated to maintain FIFO ordering (and the unique lock is locking a regular mutex here). In other words, even though we are implementing our own smutex to try and guarantee FIFO ordering, because our implementation itself relies on a regular mutex, which doesn't provide that guarantee, ours ultimately won't provide that guarantee either. This issue is separate from the order of execution getting to the mutex locking line, since even if two threads lock the mutex in the same order they called smutex::lock, there would be no guarantee they would get ownership in this order.


Ethics Material Practice Problem

Problem 3

  • Could another indirect stakeholder be the trucking company that contracts the software to its truck drivers but doesn't actually use the software themself (i.e. trucking company could also be impacted by lawsuits, etc.)?
Sure! This list of indirect stakeholders isn't exhaustive, but it comes down to who is affected by the system without directly using it.s