$\DeclareMathOperator{\p}{Pr}$ $\DeclareMathOperator{\P}{Pr}$ $\DeclareMathOperator{\c}{^C}$ $\DeclareMathOperator{\or}{ or}$ $\DeclareMathOperator{\and}{ and}$ $\DeclareMathOperator{\var}{Var}$ $\DeclareMathOperator{\E}{E}$ $\DeclareMathOperator{\std}{Std}$ $\DeclareMathOperator{\Ber}{Bern}$ $\DeclareMathOperator{\Bin}{Bin}$ $\DeclareMathOperator{\Poi}{Poi}$ $\DeclareMathOperator{\Uni}{Uni}$ $\DeclareMathOperator{\Exp}{Exp}$ $\DeclareMathOperator{\N}{N}$ $\DeclareMathOperator{\R}{\mathbb{R}}$ $\newcommand{\d}{\, d}$

Quiz 3


CS109 Winter 2021. March 17 to 19. By Chris Piech, Jerry Cain, and Katie Creel

Take-Home Quiz information

Each quiz will be an open-book, open-note exam. We have designed this quiz to approximate about 3 hours of active work (it is about half as long as an in-person exam designed for 3 hours), however as we have learned, some people will chose to spend more time.
  1. You can submit multiple times; we will only grade the last submission you submit before 11:59pm Anywhere on Earth on Friday, March 19$^\text{th}$. No late submissions can be accepted. When uploading, please assign pages to each question.
  2. You should upload your submission as a PDF to Gradescope. We provide a LaTeX template if you find it useful, but we will accept any legible submission. You may also find the CS109 Probability LaTeX reference useful: https://www.overleaf.com/project/5f650a577489e90001f065be
  3. Course staff assistance will be limited to clarifying questions of the kind that might be allowed on a traditional, in-person exam. If you have questions during the exam, please ask them as private posts via our discussion forum. We will not have any office hours for answering quiz questions during the quiz, and we can't answer any questions about course material while the quiz it out.
  4. For each problem, briefly explain/justify how you obtained your answer at a level such that a future CS109 student would be able to understand how to solve the problem. If it's not fully clear how you arrived at your answer, you will not receive full credit. It is fine for your answers to be a well-defined mathematical expression including summations, products, factorials, exponents, and combinations, unless the question specifically asks for a numeric quantity or closed form. Where numeric answers are required, fractions are fine.

Honor Code Guidelines for Take-Home Quizzes

This exam must be completed individually. It is a violation of the Stanford Honor Code to communicate with any other humans about this exam (other than CS109 course staff), to solicit solutions to this exam, or to share your solutions with others.

The take-home exams are open-book: open lecture notes, handouts, textbooks, course lecture videos, and internet searches for conceptual information (e.g., Wikipedia). Consultation of other humans in any form or medium (e.g., communicating with classmates, asking questions on sites like Chegg or Stack Overflow) is prohibited. All work done with the assistance of any external material in any way (other than provided CS109 course materials) must include citation (e.g., "Referred to Wikipedia page on $X$ for Question 2.''). Copying solutions is unacceptable, even with citation. If by chance you encounter solutions to the problem, navigate away from that page before you feel tempted to copy.

If you become aware of any Honor Code violations by any student in the class, your commitments under the Stanford Honor Code obligate you to inform course staff. Please remember that there is no reason to violate your conscience to complete a take-home exam in CS109.

Include this sentence in your submission: "I acknowledge and accept the letter and spirit of the Honor Code: {name}." where {name} is your full name.

1. Fairness in GaussHouse [25 points]

You have recently taken over the management of admissions to The GaussHouse, a mansion for content creators. The house specializes in producing educational short videos about CS109 content. Prospective residents have to apply and be selected to live in GaussHouse. Admission should be based on ability to make CS109 educational content, but is it?

The previous manager left behind data from thousands of applications from last year. You are curious about whether the selection process was biased towards Generation Z (people under 24). In addition to the age category, you also have access to whether or not an applicant had previously produced and uploaded a probability related short video, which we will call Content for short.

Gen-Z Content Accepted Probability
0 0 1 .002
0 1 1 .0075
1 0 1 .0005
1 1 1 .011
0 0 0 .023
0 1 0 .2675
1 0 0 .0995
1 1 0 .589
  1. (3 points) What is the probability that an applicant is Gen-Z?
    Answer:
  2. (3 points) Given that an applicant was accepted, what is the probability that the applicant was Gen-Z?
    Answer:
  3. (3 points) Given that an applicant was accepted, what is the probability that the applicant was not Gen-Z?
    Answer:
  4. (8 points) Next let us calculate a few fairness tests. Recall that in our class on fairness we defined random variables $A$ for a demographic variable, and $R$ for result. $R=1$ is a positive result, in this case acceptance.
    1. Independence Fairness means that acceptance rates should be the same for all groups. It is defined as: \begin{equation} P(R=1|A=a) = P(R=1|A=b) \end{equation} Did the previous admissions to GaussHouse satisfy independence with respect to age?
      Answer:
    2. Relaxed Independence Condition is defined as: \begin{equation} \frac{P(R=1|A=a)}{P(R=1|A=b)} \geq 1 - \varepsilon \end{equation} Where $A=a$ is the event that an applicant is from the group which is less represented in the applicant pool and $A=b$ is the event that an applicant is from the more represented group. Did previous admissions satisfy relaxed independence with respect to age for $\varepsilon = .2$?
      Answer:
  5. (4 points) Your house wants to update their admission system, and one of their goals is to satisfy "fairness through unawareness". One option is to hide an applicant's age from people who review applications. You notice that applications still have people's names. US census data makes it very clear that names change substantially in popularity over time. How could the inclusion of an applicant's name impact fairness through unawareness even if age is removed?
    Answer:
  6. (4 points) You want to ensure that future admissions to GaussHouse are fair. Which of the types of fairness that we talked about in class (fairness through unawareness; independence; relaxed independence; separation) would you choose and why? In your answer you should explain in words what each of the four types of fairness means in this context. Base your answer on what you think would be the best fairness standard for this context, not on what current US discrimination law requires. A precise, well justified opinion will receive full credit.
    Answer:

2. Learning to Play Darts [25 points]

The Game of Darts requires a single player to throw a physical dart toward a circular board with hopes of striking the board as close to its center as possible. For our purposes, we'll assume that the center of the circular board is the origin---that is, the coordinate (0, 0)---and that the radius of the dart board is 9 inches. Someone new to the game throws darts that rarely hit anywhere near the center. Rather, the novice throws darts that strike the board according to independent random variables $X$ and $Y$.

Now, even novices have a sense for left-to-right centering, so $X \sim \mathcal{N}(0, 4)$. But they struggle to understand gravity's pull on the dart, so $Y$ isn't distributed as a Gaussian, because rather as something more complicated: $Y \sim 10 - \textrm{Exp}(\frac{1}{12})$. This problem takes interest in computing various statistics about dart throws and ultimately predicting how far a dart lands from the center of the board when guided by these two independent distributions.

  1. (6 points) Compute $P(X \leq 0, 9 \leq Y \leq 10)$. This is the probability that the dart lands above and to the left of the top of the entire dart board.
    Show all of your work, but include a numeric answer for the probability of interest.
  2. (5 points) Without relying on any calculus at all, derive values for $E[X^2]$, $E[Y]$, and $E[Y^2]$.
    Show all of your work, but include numeric answers for each of the three expected values.
  3. (6 points) If you plan to throw 10000 darts over the course of, say, a week of intense practice, you would expect the average y coordinate of all throws to itself conform to a probability distribution. What is that probability distribution? You should assume all throws are independent. Specify the model and all necessary parameters.
    Show all of your work, but be sure to present the model and its parameter or parameters.
  4. (8 points) Consider the random variable $X_{\textrm{sq}} = X^2$. First, compute the cumulative distribution function $F_{X_{\textrm{sq}}}(a)$, which is the probability that $X_{\textrm{sq}} \leq a$ for some value of $a$ in the support of $X_{\textrm{sq}}$. From that, compute $f_{X_{\textrm{sq}}}(a)$, which is the PDF of $X_{\textrm{sq}}$. Your CDF can be defined in terms of $\Phi$, but your PDF should be written as a continuous function on a.
    Show all of your work, and be sure to clearly identify your $F_{X_{\textrm{sq}}}(a)$ and $f_{X_{\textrm{sq}}}(a)$ functions.

    3. Lambda Estimation for Eye Evaluation[25 points]

    This quarter in CS109 we are helping our eye doctor friends think about probability for evaluation of inflammation in the eye. In our last quiz we learned that when a patient has eye inflammation, eye doctors "grade" the inflammation. When "grading" inflammation they look at a single, randomly-chosen 1 millimeter by 1 millimeter square in the patient's eye and count how many "cells" they see. The number of cells that a doctor counts, $X$, is governed by a Poisson process where the parameter $\lambda$ is the true average rate of cells in a 1mm by 1mm square, $X \sim \text{Poi}(\lambda)$. For example, a patient could have a true inflammation rate of $\lambda = 4$, then in a particular 1x1 sample, the chance that a doctor observes $X=3$ would be the probability of a Poisson with rate $\lambda = 4$ producing a 3.

    In Quiz 2 we estimated the probability of different counts given the true rate $\lambda$. In this quiz we are going to think about our belief regarding the true rate $\lambda$ based on an observed count. This is going to require us to think of the true rate as a random variable.

    Inflammation prior: Based on millions of historical patients, doctors have learned that the prior probability density function of true rate of cells is: \begin{align*} f(\lambda) = K \cdot \lambda \cdot e ^{-\frac{\lambda}{2}} \end{align*} Where $K$ is a normalization constant and $\lambda$ must be greater than 0.
    1. (8 points) A doctor takes a single sample and counts 4 cells. Give an equation for the updated probability density of $\lambda$. Use the "Inflammation prior" as the prior probability density over values of $\lambda$. Your probability density may have a constant term.
      Answer:
    2. (7 points) A doctor takes a single sample and counts 4 cells. What is the Maximum A-Posteriori estimate of $\lambda$? Use the "Inflammation prior".
      In addition to providing an expression above, please compute a numeric answer:
    3. (3 points) Explain, in words, the difference between the two estimates of lambda in part (a) and part (b).
      Answer:
    4. (8 points) A patient comes on two separate days. The first day the doctor counts 5 cells, the second day the doctor counts 4 cells. Based only on this observation, and treating the true rates on the two days as independent, what is the probability that the patient's inflammation has gotten better (in other words, that their $\lambda$ has decreased)? For full credit provide an expression to calculate the probability exactly. For most of the credit provide an explanation for how you could approximate the probability.
      Answer:

      4. Predicting Drug Effectiveness [25 points]

      You have been given a grant to work with the FDA to build a machine learning model that can predict how effective new drugs will be at curing diseases. A drug's effectiveness is the probability that it cures a patient.

      The FDA gives you a mountain of historical data of drugs. The data contains $n$ drugs all of which are in the exact same format: $x^{(i)}, p^{(i)}$ where $x^{(i)}$ is an $m$ dimensional vector of 0s and 1s describing features of the drug and $p^{(i)}$ is the observed probability that the drug cures the disease when given to a patient. Note that $p^{(i)}$ must be in the range 0 to 1 as it is a probability.

      Your decide to build a model, inspired by Logistic Regression, called Logistic-Beta. Logistic-Beta uses the features of a drug $x$ and predicts how likely it will be that a drug has a given effectiveness value $p$. To do so, our model first generates two values, $a$ and $b$. These values are then interpreted as the parameters of a Beta distribution: \begin{gather*} a = 1+ \sum_{j=0}^m x_j \cdot \theta_j \\ b = 1+\sum_{j=0}^m x_j \cdot \phi_j \\ Y \sim \text{Beta}(a, b) \end{gather*}

      What is likelihood of a single drug with features $x$ and effectiveness $p$? It is the probability density that the Beta random variable $Y$, which is computed using $x$, takes on the value $p$. The model has $2m+2$ learnable parameters across two equal sized vectors: $\theta$ and $\phi$.

      1. (8 points) Imagine you have $m=2$ features per drug and have finished training your model. You have learned parameters $\phi = [-1,0, 3]$ and $\theta = [2,1,2]$. For a new drug with features $x = [1, 1]$ what is your model's belief that the effectiveness is greater than 0.8? You may use a computer, but must still show your work.
        Answer:
      2. (8 points) Write an expression for the log likelihood of the $n$ drugs in the dataset.
        Answer:
      3. (9 points) Compute all derivatives that you would need in order to learn the parameters $\theta$ and $\phi$ via gradient descent.
        Answer: