A major use of probability in statistical inference is the updating of probabilities when certain events are observed. The updated probability of event A after we learn that event B has occurred is the conditional probability of A given B.
Example 2.1.1 is typical of the following situation. An experiment is performed for which the sample space S is given (or can be constructed easily) and the probabilities are available for all of the events of interest. We then learn that some event B has occuured, and we want to know how the probability of another event A changes after we learn that B has occurred. In Example 2.1.1, the event that we have learned is B={one of the numbers drawn is 15}. We are certainly interested in the probability
of
A={the numbers 1, 14, 15, 20, 23, and 27 are drawn},
If we know that the event B has occurred, then we know that the outcome of the experiment is one of those included in B. Hence, to evaluate the probability that A will occur, we must consider the set of those outcomes in B that also result in the occurrence of A. As sketched in Figure 2.1, this set is precisely the set A∩B. It is therefore natural to calculate the revised probability of A according to the following definition.
Figure 2.1:The outcomes in the event B that also belong to the event A
For convenience, the notation in Definition 2.1.1 is read simply as the conditional probability of A given B. (2.1.1) indicates that Pr(A∣B) is computed as the proportion of the total probability Pr(B) that is represented by Pr(A∩B), intuitively the proportion of B that is also part of A.
Definition 2.1.1 for the conditional probability Pr(A∣B) is worded in terms of the subjective interpretation of probability in 1.2 Interpretations of Probability. (2.1.1) also has a simple meaning in terms of the frequency interpretation of probability. According to the frequency interpretation, if an experimental process is repeated a large number of times, then the proportion of repetitions in which the event B will occur is approximately Pr(B) and the proportion of repetitions in which both the event A and the event B will occur is approximately Pr(A∩B). Therefore, among those repetitions in which the event B occurs, the proportion of repetitions in which the event A will also occur is approximately equal to
The Multiplication Rule for Conditional Probabilities¶
In some experiments, certain conditional probabilities are relatively easy to assign directly. In these experiments, it is then possible to compute the probability that both of two events occur by applying the next result that follows directly from (2.1.1) and the analogous definition of Pr(B∣A).
Note: Conditional Probabilities Behave Just Like Probabilities¶
In all of the situations that we shall encounter in this text, every result that we can prove has a conditional version given an event B with Pr(B)>0. Just replace all probabilities by conditional probabilities given B and replace all conditional probabilities given other events C by conditional probabilities given C∩B. For example, Theorem 1.5.3 says that Pr(Ac)=1−Pr(A). It is easy to prove that Pr(Ac∣B)=1−Pr(A∣B) if Pr(B)>0. (See Exercises Div and Div in this section.) Another example is Theorem 2.1.3, which is a conditional version of the multiplication rule Theorem 2.1.2. Although a proof is given for Theorem 2.1.3, we shall not provide proofs of all such conditional theorems, because their proofs are generally very similar to the proofs of the unconditional versions.
Div shows how to calculate the probability of an event by partitioning the sample space into two events B and Bc. This result easily generalizes to larger partitions, and when combined with Theorem 2.1.1 it leads to a very powerful tool for calculating probabilities.
Typically, the events that make up a partition are chosen so that an important source of uncertainty in the problem is reduced if we learn which event has occurred.
Partitions can facilitate the calculations of probabilities of certain events.
Figure 2.2:The intersections of A with events B1,…,B5 of a partition in the proof of Theorem 2.1.4
Note: Conditional Version of Law of Total Probability¶
The law of total probability has an analog conditional on another event C, namely,
Augmented Experiment: In some experiments, it may not be clear from the initial description of the experiment that a partition exists that will facilitate the calculation of probabilities. However, there are many such experiments in which such a partition exists if we imagine that the experiment has some additional structure. Consider the following modification of Examples Example 2.1.8 and Example 2.1.9.
In Example 2.1.11, there is only one box of bolts, but we believe that it has one of two possible compositions. We let the events B1 and B2 determine the possible compositions. This type of situation is very common in experiments.
The events B1,B2,…,B11 in Example 2.1.12 can be thought of in much the same way as the two events B1 and B2 that determine the mixture of long and short bolts in Example 2.1.11. There is only one box of bolts, but there is uncertainty about its composition. Similarly in Example 2.1.12, there is only one group of patients, but we believe that it has one of 11 possible compositions determined by the events B1,B2,…,B11. To call these events, they must be subsets of the sample space for the experiment in question. That will be the case in Example 2.1.12 if we imagine that the experiment consists not only of observing the numbers of successes and failures among the patients but also of potentially observing enough additional patients to be able to compute p, possibly at some time very far in the future. Similarly, in Example 2.1.11, the two events B1 and B2 are subsets of the sample space if we imagine that the experiment consists not only of observing one sample bolt but also of potentially observing the entire composition of the box.
Throughout the remainder of this text, we shall implicitly assume that experiments are augmented to include outcomes that determine the values of quantities such as p. We shall not require that we ever get to observe the complete outcome of the experiment so as to tell us precisely what p is, but merely that there is an experiment that includes all of the events of interest to us, including those that determine quantities like p.
Definition 2.1.3 is worded somewhat vaguely because it is intended to cover a wide variety of cases. Here is an explicit application to Example 2.1.12.
In the remainder of the text, there will be many experiments that we assume are augmented. In such cases, we will mention which quantities (such as p in Example 2.1.13) would be determined by the augmented part of the experiment even if we do not explicitly mention that the experiment is augmented.
We shall conclude this section by discussing a popular gambling game called craps. One version of this game is played as follows: A player rolls two dice, and the sum of the two numbers that appear is observed. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. If the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or the original value. If the original value is obtained a second time before 7 is obtained, then the player wins. If the sum 7 is obtained before the original value is obtained a second time, then the player loses.
We shall now compute the probability Pr(W), where W is the event that the player will win. Let the sample space S consist of all possible sequences of sums from the rolls of dice that might occur in a game. For example, some of the elements of S are (4,7), (11), (4,3,4), (12), (10,8,2,12,6,7), etc. We see that (11)∈W but (4,7)∈Wc, etc. We begin by noticing that whether or not an outcome is in W depends in a crucial way on the first roll. For this reason, it makes sense to partition W according to the sum on the first roll. Let Bi be the event that the first roll is i for i=2,…,12.
Theorem 2.1.4 tells us that Pr(W)=∑i=212Pr(Bi)Pr(W∣Bi). Since Pr(Bi) for each i was computed in Example 1.6.5, we need to determine Pr(W∣Bi) for each i. We begin with i=2. Because the player loses if the first roll is 2, we have Pr(W∣B2)=0. Similarly, Pr(W∣B3)=0=Pr(W∣B12). Also, Pr(W∣B7)=1 because the player wins if the first roll is 7. Similarly, Pr(W∣B11)=1.
For each first roll i∈{4,5,6,8,9,10}, Pr(W∣Bi) is the probability that, in a sequence of dice rolls, the sum i will be obtained before the sum 7 is obtained. As described in Example 2.1.5, this probability is the same as the probability of obtaining the sum i when the sum must be either i or 7. Hence,
The revised probability of an event A after learning that event B (with Pr(B)>0) has occurred is the conditional probability of A given B, denoted by Pr(A∣B) and computed as Pr(A∩B)/Pr(B). Often it is easy to assess a conditional probability, such as Pr(A∣B), directly. In such a case, we can use the multiplication rule for conditional probabilities to compute Pr(A∩B)=Pr(B)Pr(A∣B). All probability results have versions conditional on an event B with Pr(B)>0: Just change all probabilities so that they are conditional on B in addition to anything else they were already conditional on. For example, the multiplication rule for conditional probabilities becomes Pr(A1∩A2∣B)=Pr(A1∣B)Pr(A2∣A1∩B). A partition is a collection of disjoint events whose union is the whole sample space. To be most useful, a partition is chosen so that an important source of uncertainty is reduced if we learn which one of the partition events occurs. If the conditional probability of an event A is available given each event in a partition, the law of total probability tells how to combine these conditional probabilities to get Pr(A).
Each time a shopper purchases a tube of toothpaste, he chooses either brand A or brand B. Suppose that for each purchase after the first, the probability is 1/3 that he will choose the same brand that he chose on his preceding purchase and the probability is 2/3 that he will switch brands. If he is equally likely to choose either brand A or brand B on his first purchase, what is the probability that both his first and second purchases will be brand A and both his third and fourth purchases will be brand B?
A box contains r red balls and b blue balls. One ball is selected at random and its color is observed. The ball is then returned to the box and k additional balls of the same color are also put into the box. A second ball is then selected at random, its color is observed, and it is returned to the box together with k additional balls of the same color. Each time another ball is selected, the process is repeated. If four balls are selected, what is the probability that the first three balls will be red and the fourth ball will be blue?
A box contains three cards. One card is red on both sides, one card is green on both sides, and one card is red on one side and green on the other. One card is selected from the box at random, and the color on one side is observed. If this side is green, what is the probability that the other side of the card is also green?
Consider again the conditions of exr-1-10-2 of 1.10 The Probability of a Union of Events. If a family selected at random from the city subscribes to newspaper A, what is the probability that the family also subscribes to newspaper B?
Consider again the conditions of exr-1-10-2 of 1.10 The Probability of a Union of Events. If a family selected at random from the city subscribes to at least one of the three newspapers A, B, and C, what is the probability that the family subscribes to newspaper A?
Suppose that a box contains one blue card and four red cards, which are labeled A, B, C, and D. Suppose also that two of these five cards are selected at random, without replacement.
(a) If it is known that card A has been selected, what is the probability that both cards are red?
(b) If it is known that at least one red card has been selected, what is the probability that both cards are red?
Consider the following version of the game of craps: The player rolls two dice. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. However, if the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or 11 or the original value. If the original value is obtained a second time before either 7 or 11 is obtained, then the player wins. If either 7 or 11 is obtained before the original value is obtained a second time, then the player loses. Determine the probability that the player will win this game.
A box contains three coins with a head on each side, four coins with a tail on each side, and two fair coins. If one of these nine coins is selected at random and tossed once, what is the probability that a head will be obtained?
A machine produces defective parts with three different probabilities depending on its state of repair. If the machine is in good working order, it produces defective parts with probability 0.02. If it is wearing down, it produces defective parts with probability 0.1. If it needs maintenance, it produces defective parts with probability 0.3. The probability that the machine is in good working order is 0.8, the probability that it is wearing down is 0.1, and the probability that it needs maintenance is 0.1. Compute the probability that a randomly selected part will be defective.
The percentages of voters classed as Liberals in three different election districts are divided as follows: in the first district, 21 percent; in the second district, 45 percent; and in the third district, 75 percent. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a Liberal?
Consider again the shopper described in Div. On each purchase, the probability that he will choose the same brand of toothpaste that he chose on his preceding purchase is 1/3, and the probability that he will switch brands is 2/3. Suppose that on his first purchase the probability that he will choose brand A is 1/4 and the probability that he will choose brand B is 3/4. What is the probability that his second purchase will be brand B?
Prien, R. F., Kupfer, D. J., Mansky, P. A., & Small, J. G. (1984). Drug therapy in the prevention of recurrences in unipolar and bipolar affective disorders. Archives of General Psychiatry, 41, 1096–1104.