Lecture 21: MLE: Maximum Likelihood Estimation

March 3rd, 2021

Lecture Materials

Learning Goals

To understand how to treat a distribution's parameters as unknowns and then determine what value those parameters should be assigned so that a distibution optimally matches data samples.

Reading

None!

Concept Check

https://www.gradescope.com/courses/226051/assignments/1070164

Questions & Answers

Q: test?

A1: live answered

Q: test

A1: live answered

Q: test

A1: live answered

Q: Does the beta distribution getting tighter after more samples have to do with the chernoff bound?

A1: live answered

Q: Hi Chris this is a test question

A1: yay!

Q: Since spring quarter course registration opens up this weekend, I was wondering if you have any general advice for what courses would be good follow-ups to CS109 (particularly ones with a lot of statistics). Thanks!

A1: Great question! Generally CS221 is the most direct successor, when you are ready. CS161 is another fantastic class. Then! There is… CS228: probabilistic graphical models. Its filled with profound math which takes the best ideas in CS109 to the next level. I haven’t checked if those are offered in Spring

Q: usually we look at negative log likelihoods right since then we can use gradient descent?

A1: coming right up in a probability course near you! (this one by monday)

Q: so we would check because we really just know that’s an extrema and don’t know whether its a min or max?

A1: exactly

Q: how do we know there aren’t any local optima or are we ok with the idea that we might not be at the global optimum?

A1: live answered

Q: Can we compute the Hessian to avoid maximums?

A1: you can! In a few days we will learn “gradient descent” which is really the bread and butter algorithm for AI. It never ends up in a maximum :)

Q: have the panelists watched the classic "lion king 1.5?"

A1: Ive seen lion king 1 and 2… what is this 1.5? That sounds like some harry potter 9 and 3/4 sort of magic… :)

Q: can you go over what unbiased vs biased means in probability?

A1: Good question. In stats theory, “unbiased” means that your guess, thought of as a random variable, is correct in expectation (it could be wrong, but in expectation its right!)

Q: Is this an example of “overfitting” or is it just not having enough data?

A1: exactly :D

Q: Is the fact that the uniform is not nicely differentiated related to the fact that the uniform distribution is not in the exponential family of distributions?

A1: i think its because of the discontinuity (which is what jerry just said). but what you say isn’t wrong, the exponential family is generally nicely differentiated.