Lecture 4: Conditioning And Bayes

Jan 20th, 2021

Lecture Materials

Slides PDF

Concept Check

Recording

Learning Goals

You should know how to use conditional probabilities, chain rule, law of total probability and Bayes theorem.

Reading

Conditional Probability, Law of Total Probability, Bayes Theorem

Concept Check

https://www.gradescope.com/courses/226051/assignments/951270

Questions & Answers

Q: Why doesn't the E have || around it?

A1: E = {(1, 3), (2, 2), (3, 1)} is an actual event space, whereas |E| is 3, which is the size of that event space.

Q: why wouldnt (2,2) be distinct? but we have (1,3) and (3,1)?

A1: As part of generating all equally likely outcomes, (1, 3), (2, 2), and (3, 1) are all equally likely. We constrain each event outcome to be clear about which die produced which number.

Q: So EF is shorthand for E intersection F?

A1: That’s exactly right, yes. You’ll often see EF to mean intersection of E and F, and you’ll even see E + F to mean E unioned with F. They’re at least at common, because + and concatentation are easier to type out on the keyboard than the real symbols for intersection and union.

Q: What would happen if E = F and P(F) = 0. Then is P(E | F) = 1 or is it still undefined?

A1: There’s an argument for it being 1, since E is just another notation for F, so if P(F) = 1, so it P(E).

Q: wait so n is each person, but it says # people who watched movie

A1: n is the number of people on Netflix, and n(E) is just being defined to be the number of people on Netflix who happened to watch Life is Beautiful.

Q: What’s an example of conditional probability with unequally likely events?

A1: I might also come from a series of experiments where you flip a coin 1000 times and observe heads 512 times, but you can’t say with 100% confident that p is 0.512, but you can say it’s a reasonable guess and the value of p you’ll use.

A2: It generally applies when you actually don’t formally know the relative probabilties, e.g. when a coin is biased and you conjecture and assume that p = 0.512.

Q: It seems like P(E|F)/P(E) is some sort of update factor – is there a specific name for it?

A1: Good question, though. I’ll ask Chris after lecture if he’s heard of anything.

A2: Not that I know of, though I really just think of it as a scaling factor: how is the probability of F changed if we know that E happens.

Q: What was the final answer for the last example?

A1: I missed the actual numeric result, but I’ll email you the actual number after lecture. He wrote out all of the pieces that needed to be added, multipled, and divided, but he may have forgotten to fully compute it.

A2: Just emailed you with the math. Email me back if you still have questions.

Q: I LOVE these intuition slides!!

A1: very powerful, yes!

Q: To clarify, is the benefit of testing in the SARS example the accuracy of a negative test result?

A1: I *just* typed something in the chat about this. Any test result steers those performing the tests in a direction to either conclude you likely don’t have it or that you might really have it in spite of a lack of symptoms and that further testing is warranted.

Q: how do we know P(O|L5)? i missed sorry

A1: All of the P(O, L(i)) are assumed to be known. He said so on slide 53 (which I had to go back and find. :))

Q: What is O again?

A1: it means the probability *after* an observing a location.

Q: Can you pair the inclusion exclusion principle with the generalized law of total probability so if the events L1 … L9 were not mutually exclusive would it still work?

A1: absolutely… it’d be a computation and notational headache, but it’s certainly the right thing to do if you need an exact answer.

Q: I’m a little confused on how to deal with the normalized constant in the location or the spam example. How come we could expand it?

A1: The P(E) in the denominator? Regardless of context, P(E) can be expanded using the law of total probabilty. In this case, it was useful, because we were given values for P(E|F), P(E|F^C), etc. so that we can piecemeal compute P(E).

Q: Im confused because with the factor of 23 if P(c) is like 0.5 wouldn’t your probability be like 11?

A1: in isolation, that’s the way the math would work out, expect that the other probabilities in the original problem wouldn’t be what they are if P(c) was really 0.5.

Q: is p(c given Ac) + p(c given A) going to be 1? using this could you not solve for p(c)?

A1: Actually, p(c given Ac) + p(c given Ac) is just p(c).

Q: Can you explain why “Actually, p(c given Ac) + p(c given Ac) is just p(c).” (answer to most recent question)

A1: Think of it in terms of event spaces. C = (C intersect A) unioned with (C intersect Acomplement). This basically says that events that fall in C either fall in A as well or fall outside of A.

Q: Is there an intuition for the Monty Hall problem? I understand the calculations, but it’s just hard to believe intuitively.

A1: When you pick A and the prize isn’t behind Door A, the game host is obligate to reveal a door with no prize. When you switch doors, you’ve really selected two doors (A and the one you swictched to), not just one.

Q: can we apply a similar approach if only some k of the n envelopes are opended?

A1: yes, it’s still information that can be used to influence whether or not you switch and the probability of winning with a switch still goes up (though not as much)

Q: why is it orignial # envelopes -1 ?

A1: this is for the chance that the prize was not in the envelope you chose. There are n - 1 envelopes where the prize could be, that are not the one you picked