Skip to article frontmatterSkip to article content

2.1 The Definition of Conditional Probability

Introduction

A major use of probability in statistical inference is the updating of probabilities when certain events are observed. The updated probability of event AA after we learn that event BB has occurred is the conditional probability of AA given BB.

Example 2.1.1 is typical of the following situation. An experiment is performed for which the sample space SS is given (or can be constructed easily) and the probabilities are available for all of the events of interest. We then learn that some event BB has occuured, and we want to know how the probability of another event AA changes after we learn that BB has occurred. In Example 2.1.1, the event that we have learned is B={one of the numbers drawn is 15}B = \{\text{one of the numbers drawn is }15\}. We are certainly interested in the probability of

A={the numbers 1, 14, 15, 20, 23, and 27 are drawn},A = \{\text{the numbers 1, 14, 15, 20, 23, and 27 are drawn}\},

and possibly other events.

If we know that the event BB has occurred, then we know that the outcome of the experiment is one of those included in BB. Hence, to evaluate the probability that AA will occur, we must consider the set of those outcomes in BB that also result in the occurrence of AA. As sketched in Figure 2.1, this set is precisely the set ABA \cap B. It is therefore natural to calculate the revised probability of AA according to the following definition.

The outcomes in the event B that also belong to the event A

Figure 2.1:The outcomes in the event BB that also belong to the event AA

For convenience, the notation in Definition 2.1.1 is read simply as the conditional probability of AA given BB. (2.1.1) indicates that Pr(AB)\Pr(A \mid B) is computed as the proportion of the total probability Pr(B)\Pr(B) that is represented by Pr(AB)\Pr(A \cap B), intuitively the proportion of BB that is also part of AA.

Definition 2.1.1 for the conditional probability Pr(AB)\Pr(A \mid B) is worded in terms of the subjective interpretation of probability in 1.2 Interpretations of Probability. (2.1.1) also has a simple meaning in terms of the frequency interpretation of probability. According to the frequency interpretation, if an experimental process is repeated a large number of times, then the proportion of repetitions in which the event BB will occur is approximately Pr(B)\Pr(B) and the proportion of repetitions in which both the event AA and the event BB will occur is approximately Pr(AB)\Pr(A \cap B). Therefore, among those repetitions in which the event BB occurs, the proportion of repetitions in which the event AA will also occur is approximately equal to

Pr(AB)=Pr(AB)Pr(B)\Pr(A \mid B) = \frac{\Pr(A \cap B)}{\Pr(B)}
Treatment Group
ResponseImipramineLithiumCombinationPlaceboTotal
Relapse1813222477
No relapse2225161073
Total40383834150

The Multiplication Rule for Conditional Probabilities

In some experiments, certain conditional probabilities are relatively easy to assign directly. In these experiments, it is then possible to compute the probability that both of two events occur by applying the next result that follows directly from (2.1.1) and the analogous definition of Pr(BA)\Pr(B \mid A).

Note: Conditional Probabilities Behave Just Like Probabilities

In all of the situations that we shall encounter in this text, every result that we can prove has a conditional version given an event BB with Pr(B)>0\Pr(B) > 0. Just replace all probabilities by conditional probabilities given BB and replace all conditional probabilities given other events CC by conditional probabilities given CBC \cap B. For example, Theorem 1.5.3 says that Pr(Ac)=1Pr(A)\Pr(A^c) = 1 − \Pr(A). It is easy to prove that Pr(AcB)=1Pr(AB)\Pr(A^c \mid B) = 1 − \Pr(A \mid B) if Pr(B)>0\Pr(B) > 0. (See Exercises Div and Div in this section.) Another example is Theorem 2.1.3, which is a conditional version of the multiplication rule Theorem 2.1.2. Although a proof is given for Theorem 2.1.3, we shall not provide proofs of all such conditional theorems, because their proofs are generally very similar to the proofs of the unconditional versions.

Conditional Probability and Partitions

Div shows how to calculate the probability of an event by partitioning the sample space into two events BB and BcB^c. This result easily generalizes to larger partitions, and when combined with Theorem 2.1.1 it leads to a very powerful tool for calculating probabilities.

Typically, the events that make up a partition are chosen so that an important source of uncertainty in the problem is reduced if we learn which event has occurred.

Partitions can facilitate the calculations of probabilities of certain events.

The intersections of A with events B_1, \ldots, B_5 of a partition in the proof of

Figure 2.2:The intersections of AA with events B1,,B5B_1, \ldots, B_5 of a partition in the proof of Theorem 2.1.4

Note: Conditional Version of Law of Total Probability

The law of total probability has an analog conditional on another event CC, namely,

Pr(AC)=j=1kPr(BjC)Pr(ABjC).\Pr(A \mid C) = \sum_{j=1}^k \Pr(B_j \mid C)\Pr(A \mid B_j \cap C).

{#eq-2-1-5}

The reader can prove this in Div.

Augmented Experiment: In some experiments, it may not be clear from the initial description of the experiment that a partition exists that will facilitate the calculation of probabilities. However, there are many such experiments in which such a partition exists if we imagine that the experiment has some additional structure. Consider the following modification of Examples Example 2.1.8 and Example 2.1.9.

In Example 2.1.11, there is only one box of bolts, but we believe that it has one of two possible compositions. We let the events B1B_1 and B2B_2 determine the possible compositions. This type of situation is very common in experiments.

The events B1,B2,,B11B_1, B_2, \ldots, B_{11} in Example 2.1.12 can be thought of in much the same way as the two events B1B_1 and B2B_2 that determine the mixture of long and short bolts in Example 2.1.11. There is only one box of bolts, but there is uncertainty about its composition. Similarly in Example 2.1.12, there is only one group of patients, but we believe that it has one of 11 possible compositions determined by the events B1,B2,,B11B_1, B_2, \ldots, B_{11}. To call these events, they must be subsets of the sample space for the experiment in question. That will be the case in Example 2.1.12 if we imagine that the experiment consists not only of observing the numbers of successes and failures among the patients but also of potentially observing enough additional patients to be able to compute pp, possibly at some time very far in the future. Similarly, in Example 2.1.11, the two events B1B_1 and B2B_2 are subsets of the sample space if we imagine that the experiment consists not only of observing one sample bolt but also of potentially observing the entire composition of the box.

Throughout the remainder of this text, we shall implicitly assume that experiments are augmented to include outcomes that determine the values of quantities such as pp. We shall not require that we ever get to observe the complete outcome of the experiment so as to tell us precisely what pp is, but merely that there is an experiment that includes all of the events of interest to us, including those that determine quantities like pp.

Definition 2.1.3 is worded somewhat vaguely because it is intended to cover a wide variety of cases. Here is an explicit application to Example 2.1.12.

In the remainder of the text, there will be many experiments that we assume are augmented. In such cases, we will mention which quantities (such as pp in Example 2.1.13) would be determined by the augmented part of the experiment even if we do not explicitly mention that the experiment is augmented.

The Game of Craps

We shall conclude this section by discussing a popular gambling game called craps. One version of this game is played as follows: A player rolls two dice, and the sum of the two numbers that appear is observed. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. If the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or the original value. If the original value is obtained a second time before 7 is obtained, then the player wins. If the sum 7 is obtained before the original value is obtained a second time, then the player loses.

We shall now compute the probability Pr(W)\Pr(W), where WW is the event that the player will win. Let the sample space SS consist of all possible sequences of sums from the rolls of dice that might occur in a game. For example, some of the elements of SS are (4,7)(4, 7), (11)(11), (4,3,4)(4, 3, 4), (12)(12), (10,8,2,12,6,7)(10, 8, 2, 12, 6, 7), etc. We see that (11)W(11) \in W but (4,7)Wc(4, 7) \in W^c, etc. We begin by noticing that whether or not an outcome is in WW depends in a crucial way on the first roll. For this reason, it makes sense to partition WW according to the sum on the first roll. Let BiB_i be the event that the first roll is ii for i=2,,12i = 2, \ldots, 12.

Theorem 2.1.4 tells us that Pr(W)= i=212Pr(Bi)Pr(WBi)\Pr(W) = \sum_{i=2}^{12}\Pr(B_i)\Pr(W \mid B_i). Since Pr(Bi)\Pr(B_i) for each ii was computed in Example 1.6.5, we need to determine Pr(WBi)\Pr(W \mid B_i) for each ii. We begin with i=2i = 2. Because the player loses if the first roll is 2, we have Pr(WB2)=0\Pr(W \mid B_2) = 0. Similarly, Pr(WB3)=0=Pr(WB12)\Pr(W \mid B_3) = 0 = \Pr(W \mid B_{12}). Also, Pr(WB7)=1\Pr(W \mid B_7) = 1 because the player wins if the first roll is 7. Similarly, Pr(WB11)=1\Pr(W \mid B_{11}) = 1.

For each first roll i{4,5,6,8,9,10}i \in \{4, 5, 6, 8, 9, 10\}, Pr(WBi)\Pr(W \mid B_i) is the probability that, in a sequence of dice rolls, the sum ii will be obtained before the sum 7 is obtained. As described in Example 2.1.5, this probability is the same as the probability of obtaining the sum ii when the sum must be either ii or 7. Hence,

Pr(WBi)=Pr(Bi)Pr(BiB7).\Pr(W \mid B_i) = \frac{\Pr(B_i)}{\Pr(B_i \cup B_7)}.

We compute the necessary values here:

Pr(WB4)=336336+636=13,  Pr(WB5)=436436+636=25,Pr(WB6)=536536+636=511,  Pr(WB8)=536536+636=511,Pr(WB9)=436436+636=25,  Pr(WB10)=336336+636=13.\begin{align*} \Pr(W \mid B_4) &= \frac{\frac{3}{36}}{\frac{3}{36} + \frac{6}{36}} = \frac{1}{3}, \; \Pr(W \mid B_5) &= \frac{\frac{4}{36}}{\frac{4}{36} + \frac{6}{36}} = \frac{2}{5}, \\ \Pr(W \mid B_6) &= \frac{\frac{5}{36}}{\frac{5}{36} + \frac{6}{36}} = \frac{5}{11}, \; \Pr(W \mid B_8) &= \frac{\frac{5}{36}}{\frac{5}{36} + \frac{6}{36}} = \frac{5}{11}, \\ \Pr(W \mid B_9) &= \frac{\frac{4}{36}}{\frac{4}{36} + \frac{6}{36}} = \frac{2}{5}, \; \Pr(W \mid B_{10}) &= \frac{\frac{3}{36}}{\frac{3}{36} + \frac{6}{36}} = \frac{1}{3}. \end{align*}

Finally, we compute the sum i=212Pr(Bi)Pr(WBi)\sum_{i=2}^{12}\Pr(B_i)\Pr(W \mid B_i):

Pr(W)=i=212Pr(Bi)Pr(WBi)=0+0+33613+43625+536511+636+536511+43625+33613+236+0=29285940=0.493.\begin{align*} \Pr(W) &= \sum_{i=2}^{12}\Pr(B_i)\Pr(W \mid B_i) = 0 + 0 + \frac{3}{36}\frac{1}{3} + \frac{4}{36}\frac{2}{5} + \frac{5}{36}\frac{5}{11} + \frac{6}{36} \\ &+ \frac{5}{36}\frac{5}{11} + \frac{4}{36}\frac{2}{5} + \frac{3}{36}\frac{1}{3} + \frac{2}{36} + 0 = \frac{2928}{5940} = 0.493. \end{align*}

Thus, the probability of winning in the game of craps is slightly less than 1/21/2.

Summary

The revised probability of an event AA after learning that event BB (with Pr(B)>0\Pr(B) > 0) has occurred is the conditional probability of AA given BB, denoted by Pr(AB)\Pr(A \mid B) and computed as Pr(AB)/Pr(B)\Pr(A \cap B) / \Pr(B). Often it is easy to assess a conditional probability, such as Pr(AB)\Pr(A \mid B), directly. In such a case, we can use the multiplication rule for conditional probabilities to compute Pr(AB)=Pr(B)Pr(AB)\Pr(A \cap B) = \Pr(B)\Pr(A \mid B). All probability results have versions conditional on an event BB with Pr(B)>0\Pr(B) > 0: Just change all probabilities so that they are conditional on BB in addition to anything else they were already conditional on. For example, the multiplication rule for conditional probabilities becomes Pr(A1A2B)=Pr(A1B)Pr(A2A1B)\Pr(A_1 \cap A_2 \mid B) = \Pr(A_1 \mid B)\Pr(A_2 \mid A_1 \cap B). A partition is a collection of disjoint events whose union is the whole sample space. To be most useful, a partition is chosen so that an important source of uncertainty is reduced if we learn which one of the partition events occurs. If the conditional probability of an event AA is available given each event in a partition, the law of total probability tells how to combine these conditional probabilities to get Pr(A)\Pr(A).

Exercises

Exercise 2.1.3

If SS is the sample space of an experiment and AA is any event in that space, what is the value of Pr(AS)\Pr(A \mid S)?

Exercise 2.1.4

Each time a shopper purchases a tube of toothpaste, he chooses either brand AA or brand BB. Suppose that for each purchase after the first, the probability is 1/31/3 that he will choose the same brand that he chose on his preceding purchase and the probability is 2/32/3 that he will switch brands. If he is equally likely to choose either brand AA or brand BB on his first purchase, what is the probability that both his first and second purchases will be brand AA and both his third and fourth purchases will be brand BB?

Exercise 2.1.5

A box contains rr red balls and bb blue balls. One ball is selected at random and its color is observed. The ball is then returned to the box and kk additional balls of the same color are also put into the box. A second ball is then selected at random, its color is observed, and it is returned to the box together with kk additional balls of the same color. Each time another ball is selected, the process is repeated. If four balls are selected, what is the probability that the first three balls will be red and the fourth ball will be blue?

Exercise 2.1.6

A box contains three cards. One card is red on both sides, one card is green on both sides, and one card is red on one side and green on the other. One card is selected from the box at random, and the color on one side is observed. If this side is green, what is the probability that the other side of the card is also green?

Exercise 2.1.7

Consider again the conditions of exr-1-10-2 of 1.10 The Probability of a Union of Events. If a family selected at random from the city subscribes to newspaper AA, what is the probability that the family also subscribes to newspaper BB?

Exercise 2.1.8

Consider again the conditions of exr-1-10-2 of 1.10 The Probability of a Union of Events. If a family selected at random from the city subscribes to at least one of the three newspapers AA, BB, and CC, what is the probability that the family subscribes to newspaper AA?

Exercise 2.1.9

Suppose that a box contains one blue card and four red cards, which are labeled AA, BB, CC, and DD. Suppose also that two of these five cards are selected at random, without replacement.

(a) If it is known that card AA has been selected, what is the probability that both cards are red? (b) If it is known that at least one red card has been selected, what is the probability that both cards are red?

Exercise 2.1.10

Consider the following version of the game of craps: The player rolls two dice. If the sum on the first roll is 7 or 11, the player wins the game immediately. If the sum on the first roll is 2, 3, or 12, the player loses the game immediately. However, if the sum on the first roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or 11 or the original value. If the original value is obtained a second time before either 7 or 11 is obtained, then the player wins. If either 7 or 11 is obtained before the original value is obtained a second time, then the player loses. Determine the probability that the player will win this game.

Exercise 2.1.11

For any two events AA and BB with Pr(B)>0\Pr(B) > 0, prove that Pr(AcB)=1Pr(AB)\Pr(A^c \mid B) = 1 − \Pr(A \mid B).

Exercise 2.1.12

For any three events AA, BB, and DD, such that Pr(D)>0\Pr(D) > 0, prove that Pr(ABD)=Pr(AD)+Pr(BD)Pr(ABD)\Pr(A \cup B \mid D) = \Pr(A \mid D) + \Pr(B \mid D) − \Pr(A \cap B \mid D).

Exercise 2.1.13

A box contains three coins with a head on each side, four coins with a tail on each side, and two fair coins. If one of these nine coins is selected at random and tossed once, what is the probability that a head will be obtained?

Exercise 2.1.14

A machine produces defective parts with three different probabilities depending on its state of repair. If the machine is in good working order, it produces defective parts with probability 0.02. If it is wearing down, it produces defective parts with probability 0.1. If it needs maintenance, it produces defective parts with probability 0.3. The probability that the machine is in good working order is 0.8, the probability that it is wearing down is 0.1, and the probability that it needs maintenance is 0.1. Compute the probability that a randomly selected part will be defective.

Exercise 2.1.15

The percentages of voters classed as Liberals in three different election districts are divided as follows: in the first district, 21 percent; in the second district, 45 percent; and in the third district, 75 percent. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a Liberal?

Exercise 2.1.16

Consider again the shopper described in Div. On each purchase, the probability that he will choose the same brand of toothpaste that he chose on his preceding purchase is 1/31/3, and the probability that he will switch brands is 2/32/3. Suppose that on his first purchase the probability that he will choose brand AA is 1/41/4 and the probability that he will choose brand BB is 3/43/4. What is the probability that his second purchase will be brand BB?

Exercise 2.1.17

Prove the conditional version of the law of total probability (eq-2-1-5).

References
  1. Prien, R. F., Kupfer, D. J., Mansky, P. A., & Small, J. G. (1984). Drug therapy in the prevention of recurrences in unipolar and bipolar affective disorders. Archives of General Psychiatry, 41, 1096–1104.