CS109 Winter 2021. February 24-27. By Chris Piech, Jerry Cain, Freya, and Doris
This exam must be completed individually. It is a violation of the Stanford Honor Code to communicate with any other humans about this exam (other than CS109 course staff), to solicit solutions to this exam, or to share your solutions with others.
The take-home exams are open-book: open lecture notes, handouts, textbooks, course lecture videos, and internet searches for conceptual information (e.g., Wikipedia). Consultation of other humans in any form or medium (e.g., communicating with classmates, asking questions on sites like Chegg or Stack Overflow) is prohibited. All work done with the assistance of any external material in any way (other than provided CS109 course materials) must include citation (e.g., ``Referred to Wikipedia page on $X$ for Question 2.''). Copying solutions is unacceptable, even with citation. If by chance you encounter solutions to the problem, navigate away from that page before you feel tempted to copy.
If you become aware of any Honor Code violations by any student in the class, your commitments under the Stanford Honor Code obligate you to inform course staff. Please remember that there is no reason to violate your conscience to complete a take-home exam in CS109.
Everything in this problem is real and is up to date as of 2021.
When a patient has eye inflammation, eye doctors "grade" the inflammation. When "grading" inflammation they randomly look at a single 1 millimeter by 1 millimeter square in the patient's eye and count how many "cells" they see.
There is uncertainty in these counts. If the true average number of cells for a given patient's eye is 6, the doctor could get a different count (say 4, or 5, or 7) just by chance. As of 2021, modern eye medicine does not have a sense of uncertainty for their inflammation grades! In this problem we are going to change that. At the same time we are going to learn about poisson distributions over space.
You are cooking a big pot of vegetables and the instructions say to "stir the pot". If you don't stir the pot some vegetables will be under-cooked and some will be overcooked.
How much heat a vegetable gets depends on if it is on the bottom of the pot or not. If it is on the bottom of the pot it will get 5 units of heat per minute. If it is not, it will get 1 unit of heat per minute.
Each time you stir, each vegetable, independently, has a 1/5 chance of ending up at the bottom of the pot. For each stir, the position of a vegetable is independent of previous position and of the position of other vegetables.
Aside: This independence assumption doesn't perfectly match the real world -- for example in the real world it wouldn't be possible for all vegetables to end up at the bottom of the pot. While the independence assumption is wrong, it has a very small impact on the final answer, and it makes for a more straightforward quiz.
A massive online Stanford class has sections with 10 students each. Each student in our population has a 50% chance of identifying as female, 47% chance of identifying as male and 3% chance of identifying as non-binary. Even though students are assigned randomly to sections, a few sections end up having a very uneven gender distribution just by chance. You should assume that the population of students is so large that the percentages of students who identify as male / female / non-binary are unchanged, even if you select students without replacement.
At the start of the quarter we assign students to sections based on their time preferences. Let's consider a "greedy" section assignment algorithm and compute the expectation of the number of people who we are unable to fit in a section ("unassigned").
A class has $k=12$ students and $r=4$ sections. Each section can have $m=3$ people max (though they can have less, even zero students). In this question assume each person gives exactly one preference and that a student's preference is equally likely to be any of the $r$ sections.
The greedy algorithm we are going to analyze goes through students one-by-one (arbitrarily). When considering a student, it checks if there is space in their preferred section. If so they are assigned to that section. Otherwise they are “unassigned”. Note: this problem is meant to be a challenge.
# keep track of how big each section is
section_counts = [0 for i in range(r)]
# count how many students go unassigned
n_unassigned = 0
# loop over students in an arbitrary order
for student in students:
# each student has a single section preference
section_pref = student['pref']
# does their preferred section have space?
if section_counts[section_pref] < m:
section_counts[section_pref] += 1
else:
n_unassigned += 1
# this is what we are interested in.
print(n_unassigned)
That's all folks! Optionally include a meme about the CS109 material. Of course the meme should be in line with the Stanford Fundamental Standard. This question is worth zero points and is just for fun.
As an aside each of these problems are drawn from real world phenomena. Problem 1: in 2004 a group of international uveitis specialists created a standardization of grading eye inflammation. Until this quiz it has been used without thinking about probability. The next step in this line of thought is to think of $\lambda$ as a random variable, allowing you to be explicit about your uncertainty after observation, and to do important things, like know the probability that a patient has actually gotten worse over a period of time. Later in CS109 we will talk about thinking of parameters like $\lambda$ as random variables. Problem 2: Chris and Jerry spent a lot of time cooking this last year. And thinking about probability at the same time. Problem 3: Code in Place was a massive section-based class to teach the first half of Cs106A that Stanford offered as a community service project in the time of Covid-19. It was so large that, even though gender was split very evenly, a large number of sections ended up being all female or all male just by chance. Everyone was surprised. Problem 4: The algorithm used to assign CS109 students to section is constantly being improved. We don't use this particular one, but we do use something similar. In practice our expectation of unassigned students is very close to 0. Can you think of an algorithm for CS109 section assignments?