Quiz 2

CS109 Winter 2021. February 24-27. By Chris Piech, Jerry Cain, Freya, and Doris

Take-Home Quiz information

Each quiz will be a 72-hour open-book, open-note exam. We have designed this quiz to approximate about 3 hours of active work (it is about half as long as an in-person exam designed for 3 hours), however as we have learned, some people will chose to spend more time.

You can submit multiple times; we will only grade the last submission you submit before 2:30pm (Pacific time) on Saturday, February 27$^\text{th}$. No late submissions can be accepted. When uploading, please assign pages to each question.
You should upload your submission as a PDF to Gradescope. We provide a LaTeX template if you find it useful, but we will accept any legible submission. You may also find the CS109 Probability LaTeX reference useful: https://www.overleaf.com/project/5f650a577489e90001f065be
Course staff assistance will be limited to clarifying questions of the kind that might be allowed on a traditional, in-person exam. If you have questions during the exam, please ask them as private posts via our discussion forum. We will not have any office hours for answering quiz questions during the quiz, and we can't answer any questions about course material while the quiz it out.
For each problem, briefly explain/justify how you obtained your answer at a level such that a future CS109 student would be able to understand how to solve the problem. If it's not fully clear how you arrived at your answer, you will not receive full credit. It is fine for your answers to be a well-defined mathematical expression including summations, products, factorials, exponents, and combinations, unless the question specifically asks for a numeric quantity or closed form. Where numeric answers are required, fractions are fine.

Honor Code Guidelines for Take-Home Quizzes

This exam must be completed individually. It is a violation of the Stanford Honor Code to communicate with any other humans about this exam (other than CS109 course staff), to solicit solutions to this exam, or to share your solutions with others.

The take-home exams are open-book: open lecture notes, handouts, textbooks, course lecture videos, and internet searches for conceptual information (e.g., Wikipedia). Consultation of other humans in any form or medium (e.g., communicating with classmates, asking questions on sites like Chegg or Stack Overflow) is prohibited. All work done with the assistance of any external material in any way (other than provided CS109 course materials) must include citation (e.g., ``Referred to Wikipedia page on $X$ for Question 2.''). Copying solutions is unacceptable, even with citation. If by chance you encounter solutions to the problem, navigate away from that page before you feel tempted to copy.

If you become aware of any Honor Code violations by any student in the class, your commitments under the Stanford Honor Code obligate you to inform course staff. Please remember that there is no reason to violate your conscience to complete a take-home exam in CS109.

Include this sentence in your submission: "I acknowledge and accept the letter and spirit of the Honor Code: <name>." where <name> is your full name.

1. Better Evaluation of Eye Disease [25 points]

Everything in this problem is real and is up to date as of 2021.

When a patient has eye inflammation, eye doctors "grade" the inflammation. When "grading" inflammation they randomly look at a single 1 millimeter by 1 millimeter square in the patient's eye and count how many "cells" they see.

There is uncertainty in these counts. If the true average number of cells for a given patient's eye is 6, the doctor could get a different count (say 4, or 5, or 7) just by chance. As of 2021, modern eye medicine does not have a sense of uncertainty for their inflammation grades! In this problem we are going to change that. At the same time we are going to learn about poisson distributions over space.

A 1x1mm sample used for inflammation grading. Inflammation is graded by counting cells in a randomly chosen 1mm by 1mm square. This sample has 5 cells.

(9 points) Explain, as if teaching, why the number of cells observed in a 1x1 square is governed by a poisson process. Make sure to explain how a binomial distribution could approximate the count of cells. Explain what $\lambda$ means in this context. Note: for a given person's eye, the presence of a cell in a location is independent of the presence of a cell in another location. 100 word limit. Pictures not necessary, but allowed.
Answer:
(8 points) For a given patient the true average rate of cells is 5 cells per 1x1 sample. What is the probability that in a single 1x1 sample the doctor counts 4 cells?
In addition to providing an expression above, please compute a numeric answer:
(8 points) For a given patient the true average rate of cells is 5 cells per 1mm by 1mm sample. In an attempt to be more precise, the doctor counts cells in two different, larger 2mm by 2mm samples. Assume that the occurrences of cells in one 2mm by 2mm samples are independent of the occurrences in any other 2mm by 2mm samples. What is the probability that she counts 20 cells in the first samples and 20 cells in the second?
In addition to providing an expression above, please compute a numeric answer:

2. Stirring the Pot [25 points]

You are cooking a big pot of vegetables and the instructions say to "stir the pot". If you don't stir the pot some vegetables will be under-cooked and some will be overcooked.

How much heat a vegetable gets depends on if it is on the bottom of the pot or not. If it is on the bottom of the pot it will get 5 units of heat per minute. If it is not, it will get 1 unit of heat per minute.

Each time you stir, each vegetable, independently, has a 1/5 chance of ending up at the bottom of the pot. For each stir, the position of a vegetable is independent of previous position and of the position of other vegetables.

Aside: This independence assumption doesn't perfectly match the real world -- for example in the real world it wouldn't be possible for all vegetables to end up at the bottom of the pot. While the independence assumption is wrong, it has a very small impact on the final answer, and it makes for a more straightforward quiz.

(8 points) You stir the pot a single time and then let the vegetables cook for 10 minutes. What is the variance of heat on a vegetable?
In addition to providing an expression above, please compute a numeric answer:
(9 points) You stir the pot once every minute for 10 minutes. What is the variance of heat on a vegetable?
In addition to providing an expression above, please compute a numeric answer:
(8 points) Let's explore what this problem would look like with continuous numbers. In the continuous version, when you stir the pot, each vegetable has a distance from the bottom which is equally likely to be any real valued number between 0 and 1. The heat received by a vegetable, $H$, with distance $d$ is: $H(d) = 1 - d^2$. You stir the pot a single time and then let the vegetables cook for 10 minutes. Show an expression that you would need to solve in order to calculate the expectation of heat. Your expression should include an integral. Then, solve the expression to get a numeric answer.
In addition to providing an expression above, please compute a numeric answer:

3. Gender Composition of Sections [20 points]

A massive online Stanford class has sections with 10 students each. Each student in our population has a 50% chance of identifying as female, 47% chance of identifying as male and 3% chance of identifying as non-binary. Even though students are assigned randomly to sections, a few sections end up having a very uneven gender distribution just by chance. You should assume that the population of students is so large that the percentages of students who identify as male / female / non-binary are unchanged, even if you select students without replacement.

(6 points) Write an expression for the exact probability that a section has 0, 1, 9 or 10 people who identify as female.
Answer:
(8 points) Forty students are randomly selected to be in a single review session. Use an approximation to efficiently estimate the probability of there being more than 10 and less than 30 people who identify as female in the review session.
In addition to providing an expression above, please compute a numeric answer:
(6 points) A given section has 5 people who identify as female, 3 who identify as non-binary and 2 who identify as male. What is the probability of this exact composition?

In addition to providing an expression above, please compute a numeric answer:

4. Section Assignment Algorithm [20 points]

At the start of the quarter we assign students to sections based on their time preferences. Let's consider a "greedy" section assignment algorithm and compute the expectation of the number of people who we are unable to fit in a section ("unassigned").

A class has $k=12$ students and $r=4$ sections. Each section can have $m=3$ people max (though they can have less, even zero students). In this question assume each person gives exactly one preference and that a student's preference is equally likely to be any of the $r$ sections.

The greedy algorithm we are going to analyze goes through students one-by-one (arbitrarily). When considering a student, it checks if there is space in their preferred section. If so they are assigned to that section. Otherwise they are “unassigned”. Note: this problem is meant to be a challenge.

Algorithm: Greedy Section Time Assignment

# keep track of how big each section is
section_counts = [0 for i in range(r)]
# count how many students go unassigned
n_unassigned = 0
# loop over students in an arbitrary order
for student in students:
  # each student has a single section preference
  section_pref = student['pref']
  # does their preferred section have space?
  if section_counts[section_pref] < m:
     section_counts[section_pref] += 1
  else: 
     n_unassigned += 1
# this is what we are interested in.
print(n_unassigned)

(10 points) What is the probability that the very last person considered by the algorithm is unassigned?
In addition to providing an expression above, please compute a numeric answer:
(10 points) What is the expected number of people who will be unassigned? You do not need to provide an exact answer, instead you can leave your answer as an expression.
Answer:

5. Optional Meme [0 points]

That's all folks! Optionally include a meme about the CS109 material. Of course the meme should be in line with the Stanford Fundamental Standard. This question is worth zero points and is just for fun.

As an aside each of these problems are drawn from real world phenomena. Problem 1: in 2004 a group of international uveitis specialists created a standardization of grading eye inflammation. Until this quiz it has been used without thinking about probability. The next step in this line of thought is to think of $\lambda$ as a random variable, allowing you to be explicit about your uncertainty after observation, and to do important things, like know the probability that a patient has actually gotten worse over a period of time. Later in CS109 we will talk about thinking of parameters like $\lambda$ as random variables. Problem 2: Chris and Jerry spent a lot of time cooking this last year. And thinking about probability at the same time. Problem 3: Code in Place was a massive section-based class to teach the first half of Cs106A that Stanford offered as a community service project in the time of Covid-19. It was so large that, even though gender was split very evenly, a large number of sections ended up being all female or all male just by chance. Everyone was surprised. Problem 4: The algorithm used to assign CS109 students to section is constantly being improved. We don't use this particular one, but we do use something similar. In practice our expectation of unassigned students is very close to 0. Can you think of an algorithm for CS109 section assignments?