Lecture 7: Variance, Bernoulli, and Binomial

Jan 27th, 2021

Lecture Materials

Learning Goals

By the end of lecture, you should understand variance, how to compute it, what a Bernoulli trial is, and what a Binomial distribution is.

Reading

Variance, Bernoulli, Binomial

Concept Check

https://www.gradescope.com/courses/226051/assignments/969374

Questions & Answers

Q: wait so its the average of all the sqaure distances between the mean and …?

A1: And every given value that X can take on!

Q: what is the ( x - mu)^2 in the table?

A1: In this case the table doesn’t actually display/encode the Variance itself. We derive the variance from the pmf, which is what the table is displaying :)

Q: how does property 1 make intuitive sense?

A1: Intuitively, we are saying that variance is the difference between the average of all squared values that X can take on (E[X^2]), minus the square of the average of all non-squared value that X can take on (E[X]^2). This allows us codify the true distance between each point in our distribution to the mean.

Q: why is expectation linear?

A1: Expectation is linear since it is just the weighted sum of each value with its particular probability. Since variance contains a squared term it is not the case for variance.

Q: I don't understand the significance of Var(aX), could this be translated into more tangible terms?

A1: You can think of Var(aX) as a scaling constant which actually scales the spread of the distribution by a value of a^2. The reason we can drop the constant +b term is because it doesn’t actually change the spread. It just shifts the entire distribution but doesn’t actually affect spread.

Q: got it thank you!

A1: Of course!

Q: what does he mean by second moment?

A1: E[(X -E[X])^2] is the second central moment, whereas, E[X^2] is something known as the raw moment. In this class when we say second moment, we mean second central moment, or variance.

Q: is variance the same as the second moment? i just joined.

A1: For our purposes in this class, yes!

Q: why isn’t the variance for bernoulli p(1-p)^2? following the defition of variance

A1: Hard to answer through q&A but here is a link to a few proofs deriving the variance of bernoulli: https://proofwiki.org/wiki/Variance_of_Bernoulli_Distribution If you have ?’s about the steps, feel free to stop by Jerry or one or the TA’s office hours :)

Q: What does the “~” in, for example, X~Bin(n,p) stand for/mean?

A1: You can read “~” as “distributed as”. So X ~ Bin(n,p) means— We have some Random Variable, X, where X is distributed as a Binomial() distribution with n ind. trials and where each trial has a prob. p of success.

Q: How would we read X~Bin(X)? Is it “A binomial random variable of X”?

Q: i thin the slide should say Bin(p) random variables? first sentence on this slide

A1: Ber(p) is correct in this case. What we mean is that a Binomial distribution is the result of n independent Ber(p) distributions occuring one after the other in succession. Ex: when I flip a fair coin I have 50% probability of H or T. This is a Bernoulli, since it is either a success or failure. If I ask the question, What is the distribution of 5 independent coin flips, that will be the combination of 5 Bernoulli independent events which is a binomial distr.

Q: what are p^0, p^1, …? (from the 3 coin flips slide)

A1: p0=p1=p2=0.5. Aka we are flipping the same fair coin which has a 50/50 probability, three times!

Q: What does w.p mean?

A1: w.p. = with probability!

Q: Variance is not bounded below or above (ie it can be higher than 1 or lower than 0)?

A1: 1.) A variance cannot be negative since we square terms in the definition. 2.) Keep in mind Variance is a measure of the spread of a random variable and the support of that RV could be any number. You can have a situation as follows: X = avg. number of min a person sleeps Y = avg number of seconds a person sleeps In either case Var(X) and Var(Y) will possibly be greater than 1. Additionally the variance of y will probably be bigger than the variance of X since we changed our units from minutes -> seconds.

Q: how would you model shaking the board

A1: By shaking the board do you mean physically shaking it left/right. If so, this would be hard to model since you’re adding much more uncertainty to the system.

Q: Why did we do 5 choose k for the plinko game?

A1: We chose 5 for the 5cK since the total height of the pyramid is equal to 5. No matter what our journey will always look like L,R,R,L,L or R,R,L,L,L etc. (length 5)

Q: yeah! for the board

A1: The RV is a binomial since each pin in the board is a bernoulli random variable. So hitting multiple pins in succession can be modelled as a Binomial. Does that answer your question?

Q: I understand the 5 but why k? I don’t get why that formula will actually lead to bucket k. Does it have to do with the count of lefts and rights?

A1: Yes exactly, you can think of it as all the possible ways to choose the rights from the lefts.

Q: oh i was asking why we made the sucess “right” was that just because we can decide which one is a success?

A1: In other words, choosing all the rights from the lefts is the same as choosing all the lefts from the rights. ie (n Choose k) = (n Choose n-k)

A2: Exactly! The problem is very similar if we decide left is a success. We just chose right in this case, but the other choice is an analogous problem.

Q: for the instance that we are doing nCn, will (1 - p) be raised to the 0 power?

A1: Yep!