For example, we have the complement rule:
P(A^{c}) = 100% - P(A).
A special case of the complement rule is that
P({}) = 0%,
because P(S) = 100%, and S^{c} = {}.
An event A that has probability one is said to be certain or sure. S is certain.
The union of two events, A UB, can be broken up into three disjoint sets:
elements of A that are not in B (AB^{c}) elements of B that are not in A (A^{c}B) elements of both A and B (AB)
Together, these three sets contain every element of AUB. Therefore, the chance that either A or B occurs is
P(AUB) = P(AB^{c }U A^{c}B U AB ).
The three sets on the right are disjoint, so the third axiom implies that
P(AUB) = P(AB^{c}) + P(A^{c}B) + P(AB).
On the other hand,
P(A) = P(AB^{c} U AB) = P(AB^{c}) + P(AB),
because AB^{c} and AB are disjoint. Similarly,
P(B) = P(A^{c}B U AB) = P(A^{c}B) + P(AB),
because A^{c}B and AB are disjoint. Adding, we find
P(A) + P(B) = P(AB^{c})+ P(A^{c}B) +2×P(AB).
This would be P(AUB), but for the fact that P(AB) is counted twice, not once. It follows that, in general,
P(AUB) = P(A) + P(B) - P(AB).
Again, while this is a true statement, it is not one of the axioms of probability. In the special case that AB = {}, this reduces to one of the axioms, because, as we saw in the preceding paragraph, P({}) = 0%. It follows that
P(AUB) <= P(A) + P(B),
because, by axiom 1, P(AB) >= 0.
Moreover, because taking a union can only include additional outcomes,
P(AUB) >= P(A), and
P(AUB) >= P(B).
Probability is analogous to area or volume or mass. Consider the unit square, which has length unity on each side. Its total area is 1 (= 100%). Let's call the square S, just like outcome space. Now consider regions inside the square S (subsets of S). The area of any such region is at least zero, the area of S is 100%, and the area of two regions is the sum of their areas, if they do not overlap (i.e., if their intersection is empty). These facts are direct analogues of the axioms of probability, and we shall often use this model to get intuition about probability.
A further analogy that I find useful is to consider the square S to be a dartboard. A trial or experiment consists of throwing a dart at the board once. The event A occurs if the dart sticks in the set A. The event AB occurs if the dart sticks in both A and B on that one toss. Clearly, AB cannot occur unless A and B overlap--the dart cannot stick in two places at once. AUB occurs if the dart sticks in either A or B (or both) on that one throw. A and B need not overlap for AUB to occur.
This analogy is also useful for thinking about logical implication. If A is a subset of B, the occurrence of A implies the occurrence of B; we shall sometimes say that A implies B. In the dartboard model, the dart cannot stick in A without sticking in B as well, so if A occurs, B must occur also. If A implies B, AB=A, so P(AB)=P(A). If AB = {}, A implies B^{c} and B implies A^{c}: if the dart sticks in A it did not stick in B, and vice versa. If A implies B, then if B does not occur A cannot occur either: B^{c} implies A^{c}, so B^{c} is a subset of A^{c}.
The options in the next questions change only if you hold down the Shift key while you reload the page. If you reload the page without holding down the Shift key, the questions can be out of synch with the answers.
For in-between cases, the conditional probability of A given B is defined to be
Now suppose that A and B are not disjoint. Then if we learn that B occurred, we can restrict attention to just those outcomes that are in B, and disregard the rest of S, so we have a new outcome space that is just B. We need P(B) = 100% to consider B an outcome space; we can make this happen by dividing all probabilities by P(B). For A to have occurred in addition to B requires that AB occurred, so the conditional probability of A given B is P(AB)/P(B), just as we defined it above.
Example. We deal two cards from a well shuffled deck. What is the conditional probability that the second card is an Ace (event A), given that the first card is an Ace (event B)? This is P(AB)/P(B) by definition. The (unconditional) chance that the first card is an Ace is 100%/13 = 7.7%, because there are 13 possible faces for the first card, and all are equally likely. The chance that both cards are Aces is as follows: from the four suits, we need to pick two; there are _{4}C_{2} = 6 ways that can happen. The total number of ways of picking two cards from the deck is _{52}C_{2} = 52×51/2 = 1326, so the chance that the two cards are both Aces is (6/1326)×100% = 0.5%. The conditional probability that the second card is an Ace given that the first card is an Ace is thus 0.5%/7.7% = 5.9%. As we might expect, it is somewhat lower than the chance that the first card is an Ace, because we know one of the Aces is gone. We could approach this more intuitively as well: given that the first card is an Ace, the second card is an Ace too if it is one of the three remaining Aces among the 51 remaining cards. These possibilities are equally likely if the deck was shuffled well, so the chance is 3/51 × 100% = 5.9%.
Independent events bear a special relationship to each other. Independence is a very precise point between being disjoint (so that one event implies that the other did not occur), and one event being a subset of the other (so that one event implies the other).
Recap:
You need Java to see this.
If A and B are independent, so are
Example: Suppose I have a box with four tickets in it, labeled 1, 2, 3, and 4. I stir the tickets and then pick one, stir them again without replacing the ticket I got, and pick another. Consider the event A = {I get the ticket labeled 1 on the first draw} and the event B = {I get the ticket labeled 2 on the second draw}. Are these events dependent or independent?
Solution: The chance that I get the 1 on the first draw is 25%. The chance that I get the 2 on the second draw is 25%. The chance that I get the 2 on the second draw given that I get the 1 on the first draw is 33%, which is much larger than the unconditional chance that I draw the 2 the second time. Thus A and B are dependent.
Now suppose that I replace the ticket I got on the first draw and stir the tickets again before drawing the second time. Then the chance that I get the 1 on the first draw is 25%, the chance that I get the 2 on the second draw is 25%, and the conditional chance that I get the 2 on the second draw given that I drew the 1 the first time is also 25%. A and B are thus independent if I draw with replacement.
Example: Two fair dice are rolled independently; one is blue, the other is red. What is the chance that the number of spots that show on the red die is less than the number of spots that show on the blue die?
Solution: The event that the number of spots that show on the red die is less than the number that show on the blue die can be broken up into mutually exclusive events, according to the number of spots that show on the blue die. The chance that the number of spots that show on the red die is less than the number that show on the blue die is the sum of the chances of those simpler events. If only one spot shows on the blue die, the number that show on the red die cannot be smaller, so the probability is zero. If two spots show on the blue die, the number that show on the red die is smaller if the red die shows exactly one spot. Because the number of spots that show on the blue and red dice are independent, the chance that the blue die shows two spots and the red die shows one spot is (1/6)(1/6) = 1/36. If three spots show on the blue die, the number that show on the red die is smaller if the red die shows one or two spots. The chance that the blue die shows three spots and the red die shows one or two spots is (1/6)(2/6) = 2/36. If four spots show on the blue die, the number that show on the red die is smaller if the red die shows one, two, or three spots; the chance that the blue die shows four spots and the red die shows one, two, or three spots is (1/6)(3/6) = 3/36. Proceeding similarly for the cases that the blue die shows five or six spots gives the ultimate result:
P(red die shows fewer spots than the blue die) = 1/36 + 2/36 + 3/36 + 4/36 + 5/36 = 15/36.
Alternatively, one could just count the ways: there are 36 possibilities, which can be written in a square table:
D i e
Hint: to solve this problem, you need to evaluate an expression of the form
1 - (1-x)^{n},
where x is nearly zero and n is very large. You can find the answer approximately using the following result:
(1-x)^{n} = 1 + n×(-x) + (n×(n-1)/2)×(-x)^{2} + . . . + _{n}C_{k}×(-x)^{k} + . . . + (-x)^{n}.
The function (1-x)^{n} is called a binomial; the fact that the coefficient of x^{k} in the expansion of (1-x)^{n} is _{n}C_{k} is the reason that _{n}C_{k} is sometimes called a binomial coefficient. When x is very small, x^{2}, x^{3}, . . . are much smaller still (and they get smaller faster than _{n}C_{k} grows), so the terms involving higher powers of x than x^{1} are effectively negligable. That is, when x is nearly zero,
(1-x)^{n} is approximately 1-n×x, so 1 - (1-x)^{n} is approximately n×x.
Using that approximation is equivalent to ignoring the possibility that the sentence is typed more than once. The probability that the sentence is typed more than once is tiny compared to the chance that the sentence is typed exactly once, which is already quite small.
P(AB) = P(A|B)×P(B).
This is called the multiplication rule.
Example: A deck of cards is shuffled well, then two cards are drawn. What is the chance that both cards are aces?
P(card 1 is an Ace and card 2 is an Ace) = P(card 2 is an Ace | card 1 is an Ace)×P(card 1 is an Ace) = 3/51 × 4/52 = 0.5%.
You can see that the multiplication rule can save you a lot of time!
Example: Suppose there is a 50% chance that you catch the 8:00am bus. If you catch the bus, you will be on time. If you miss the bus, there is a 70% chance that you will be late. What is the chance that you will be late?
P(late) = P(miss the bus and late)
= P(late|miss the bus) × P(miss the bus)
= 0.5 × 0.7 = 35%.
Example: Suppose that 10% of a given population has benign chronic flatulence. Suppose that there is a standard screening test for benign chronic flatulence that has a 90% chance of correctly detecting that one has the disease, and a 10% chance of a "false positive" (erroneously reporting that one has the disease when one does not). We pick a person at random from the population (so that everyone has the same chance of being picked) and test him/her. The test is positive. What is the chance that the person has the disease?
Solution: We shall combine several things we have learned. Let D be the event that the person has the disease, and T be the event that the person tests positive for the disease. The problem statement told us that:
P(DT) = P(T|D) × P(D) = 90% × 10% = 9%.
The probability of D^{c}T is, by the multiplication rule and the complement rule,
P(D^{c}T) = P(T|D^{c}) × P(D^{c}) = P(T|D^{c}) × (100%- P(D) ) = 10% × 90% = 9%.
By one of the axioms,
P(T) = P(DT) + P(D^{c}T) = 9% + 9% = 18%,
because DT and D^{c}T are mutually exclusive.Finally, plugging in the definition of P(D|T) gives
P(D|T) = P(DT)/P(T) = 9%/18% = 50%.
Because only a small fraction of the population actually have benign chronic flatulence, the chance that a positive test result for someone selected at random from the population is a false positive is 50%, even though the test is 90% accurate.
This problem illustrates Bayes' Rule:
P(A|B) = P(B|A) × P(A) / ( P(B|A)×P(A) + P(B|A^{c}) × P(A^{c}) ).
The numerator on the right is just P(AB), computed using the multiplication rule. The denominator is just P(B), computed by partitioning B into the mutually exclusive sets AB and A^{c}B, and finding the probability of each of those pieces using the multiplication rule.
Bayes' Rule is useful to find the conditional probability of A given B in terms of the conditional probability of B given A, which is the more natural thing to measure in some problems. For example, in the disease-screening problem just above, the natural way to calibrate a test is to see how well it does at detecting a certain thing (e.g., a disease) when the thing is present, and to see how poorly it does at raising false alarms when the thing is not really present. These are, respectively, the conditional probability of detecting the thing given that the condition is present, and the conditional probability of incorrectly raising an alarm given that the thing is not present. However, the interesting quantity for an individual is the conditional chance that he or she has the disease, for example, given that the test raised an alarm.