Jan 27th, 2021

By the end of lecture, you should understand variance, how to compute it, what a Bernoulli trial is, and what a Binomial distribution is.

https://www.gradescope.com/courses/226051/assignments/969374

**Q:** wait so its the average of all the sqaure distances between the mean and …?

**A1: ** And every given value that X can take on!

**Q:** what is the ( x - mu)^2 in the table?

**A1: ** In this case the table doesn’t actually display/encode the Variance itself. We derive the variance from the pmf, which is what the table is displaying :)

**Q:** how does property 1 make intuitive sense?

**A1: ** Intuitively, we are saying that variance is the difference between the average of all squared values that X can take on (E[X^2]), minus the square of the average of all non-squared value that X can take on (E[X]^2). This allows us codify the true distance between each point in our distribution to the mean.

**Q:** why is expectation linear?

**A1: ** Expectation is linear since it is just the weighted sum of each value with its particular probability. Since variance contains a squared term it is not the case for variance.

**Q:** I don't understand the significance of Var(aX), could this be translated into more tangible terms?

**A1: ** You can think of Var(aX) as a scaling constant which actually scales the spread of the distribution by a value of a^2. The reason we can drop the constant +b term is because it doesn’t actually change the spread. It just shifts the entire distribution but doesn’t actually affect spread.

**Q:** got it thank you!

**A1: ** Of course!

**Q:** what does he mean by second moment?

**A1: ** E[(X -E[X])^2] is the second central moment, whereas, E[X^2] is something known as the raw moment. In this class when we say second moment, we mean second central moment, or variance.

**Q:** is variance the same as the second moment? i just joined.

**A1: ** For our purposes in this class, yes!

**Q:** why isn’t the variance for bernoulli p(1-p)^2? following the defition of variance

**A1: ** Hard to answer through q&A but here is a link to a few proofs deriving the variance of bernoulli:
https://proofwiki.org/wiki/Variance_of_Bernoulli_Distribution
If you have ?’s about the steps, feel free to stop by Jerry or one or the TA’s office hours :)

**Q:** What does the “~” in, for example, X~Bin(n,p) stand for/mean?

**A1: ** You can read “~” as “distributed as”. So X ~ Bin(n,p) means—
We have some Random Variable, X, where X is distributed as a Binomial() distribution with n ind. trials and where each trial has a prob. p of success.

**Q:** How would we read X~Bin(X)? Is it “A binomial random variable of X”?

**A1: ** You can read “~” as “distributed as”. So X ~ Bin(n,p) means—
We have some Random Variable, X, where X is distributed as a Binomial() distribution with n ind. trials and where each trial has a prob. p of success.

**Q:** i thin the slide should say Bin(p) random variables? first sentence on this slide

**A1: ** Ber(p) is correct in this case. What we mean is that a Binomial distribution is the result of n independent Ber(p) distributions occuring one after the other in succession.
Ex: when I flip a fair coin I have 50% probability of H or T.
This is a Bernoulli, since it is either a success or failure.
If I ask the question, What is the distribution of 5 independent coin flips, that will be the combination of 5 Bernoulli independent events which is a binomial distr.

**Q:** what are p^0, p^1, …? (from the 3 coin flips slide)

**A1: ** p0=p1=p2=0.5. Aka we are flipping the same fair coin which has a 50/50 probability, three times!

**Q:** What does w.p mean?

**A1: ** w.p. = with probability!

**Q:** Variance is not bounded below or above (ie it can be higher than 1 or lower than 0)?

**A1: ** 1.) A variance cannot be negative since we square terms in the definition.
2.) Keep in mind Variance is a measure of the spread of a random variable and the support of that RV could be any number. You can have a situation as follows:
X = avg. number of min a person sleeps
Y = avg number of seconds a person sleeps
In either case Var(X) and Var(Y) will possibly be greater than 1. Additionally the variance of y will probably be bigger than the variance of X since we changed our units from minutes -> seconds.

**Q:** how would you model shaking the board

**A1: ** By shaking the board do you mean physically shaking it left/right. If so, this would be hard to model since you’re adding much more uncertainty to the system.

**Q:** Why did we do 5 choose k for the plinko game?

**A1: ** We chose 5 for the 5cK since the total height of the pyramid is equal to 5. No matter what our journey will always look like L,R,R,L,L or R,R,L,L,L etc. (length 5)

**Q:** yeah! for the board

**A1: ** The RV is a binomial since each pin in the board is a bernoulli random variable. So hitting multiple pins in succession can be modelled as a Binomial. Does that answer your question?

**Q:** I understand the 5 but why k? I don’t get why that formula will actually lead to bucket k. Does it have to do with the count of lefts and rights?

**A1: ** Yes exactly, you can think of it as all the possible ways to choose the rights from the lefts.

**Q:** oh i was asking why we made the sucess “right” was that just because we can decide which one is a success?

**A1: ** In other words, choosing all the rights from the lefts is the same as choosing all the lefts from the rights. ie (n Choose k) = (n Choose n-k)

**A2: ** Exactly! The problem is very similar if we decide left is a success. We just chose right in this case, but the other choice is an analogous problem.

**Q:** for the instance that we are doing nCn, will (1 - p) be raised to the 0 power?

**A1: ** Yep!