Introduction¶
Suppose that we are interested in which of several disjoint events will occur and that we will get to observe some other event . If is available for each , then Bayes’ theorem is a useful formula for computing the conditional \Probabilities of the events given .
We begin with a typical example.
Example 2.3.1: Test for a Disease¶
Suppose that you are walking down the street and notice that the Department of Public Health is giving a free medical test for a certain disease. The test is 90 percent reliable in the following sense: If a person has the disease, there is a probability of 0.9 that the test will give a positive response; whereas, if a person does not have the disease, there is a probability of only 0.1 that the test will give a positive response.
Data indicate that your chances of having the disease are only 1 in 10,000. However, since the test costs you nothing, and is fast and harmless, you decide to stop and take the test. A few days later you learn that you had a positive response to the test. Now, what is the probability that you have the disease?
The last question in Div is a prototype of the question for which Bayes’ theorem was designed.We have at least two disjoint events (“you have the disease” and “you do not have the disease”) about which we are uncertain, and we learn a piece of information (the result of the test) that tells us something about the uncertain events. Then we need to know how to revise the probabilities of the events in the light of the information we learned.
We now present the general structure in which Bayes’ theorem operates before returning to the example.
2.3.1 Statement, Proof, and Examples of Bayes’ Theorem¶
Example 2.3.2: Selecting Bolts¶
Consider again the situation in Example 2.1.8, in which a bolt is selected at random from one of two boxes. Suppose that we cannot tell without making a further effort from which of the two boxes the one bolt is being selected. For example, the boxes may be identical in appearance or somebody else may actually select the box, but we only get to see the bolt. prior to selecting the bolt, it was equally likely that each of the two boxes would be selected. However, if we learn that event has occurred, that is, a long bolt was selected, we can compute the conditional \Probabilities of the two boxes given . To remind the reader, is the event that the box is selected containing 60 long bolts and 40 short bolts, while is the event that the box is selected containing 10 long bolts and 20 short bolts. In Example 2.1.9, we computed , , , and . So, for example,
Since the first box has a higher \Proportion of long bolts than the second box, it seems reasonable that the probability of should rise after we learn that a long bolt was selected. It must be that since one or the other box had to be selected.
In Div, we started with uncertainty about which of two boxes would be chosen and then we observed a long bolt drawn from the chosen box. Because the two boxes have different chances of having a long bolt drawn, the observation of a long bolt changed the \Probabilities of each of the two boxes having been chosen. The \Precise calculation of how the \Probabilities change is the purpose of Bayes’ theorem.
Theorem 2.3.1: Bayes’ Theorem¶
Let the events form a partition of the space such that for , and let be an event such that . Then, for ,
{#eq-2-3-1}
By the definition of conditional probability,
The numerator on the right side of eq-2-3-1 is equal to by Theorem 2.1.1. The denominator is equal to according to Theorem 2.1.4.
::: {#exm-2-3-3}
# Example 2.3.3: Test for a Disease
Let us return to the example with which we began this section. We have just received word that we have tested positive for a disease. The test was 90 percent reliable in the sense that we described in @exm-2-3-1. We want to know the probability that we have the disease after we learn that the result of the test is positive. Some readers may feel that this probability should be about 0.9. However, this feeling completely ignores the small probability of 0.0001 that you had the disease before taking the test. We shall let $B_1$ denote the event that you have the disease, and let $B_2$ denote the event that you do not have the disease. The events $B_1$ and $B_2$ form a partition. Also, let $A$ denote the event that the response to the test is positive. The event $A$ is information we will learn that tells us something about the partition elements. Then, by Bayes' theorem,
$$
\begin{align*}
\Pr(B_1 \mid A) &= \frac{\Pr(A \mid B_1)\Pr(B_1)}{\Pr(A \mid B_1)\Pr(B_1) + \Pr(A \mid B_2)\Pr(B_2)} \\
&= \frac{(0.9)(0.0001)}{(0.9)(0.0001) + (0.1)(0.9999)} = 0.00090.
\end{align*}
$$
Thus, the conditional probability that you have the disease given the test result is approximately only 1 in 1000. Of course, this conditional probability is approximately 9 times as great as the probability was before you were tested, but even the conditional probability is quite small.
Another way to explain this result is as follows: Only one person in every 10,000 actually has the disease, but the test gives a positive response for approximately one person in every 10. Hence, the number of positive responses is approximately 1000 times the number of persons who actually have the disease. In other words, out of every 1000 persons for whom the test gives a positive response, only one person actually has the disease. This example illustrates not only the use of Bayes' theorem but also the importance of taking into account all of the information available in a \Problem.
Example 2.3.4: Identifying the Source of a Defective Item¶
Three different machines , , and were used for \Producing a large batch of similar manufactured items. Suppose that 20 percent of the items were \Produced by machine , 30 percent by machine , and 50 percent by machine . Suppose further that 1 percent of the items \Produced by machine are defective, that 2 percent of the items \Produced by machine are defective, and that 3 percent of the items \Produced by machine are defective. Finally, suppose that one item is selected at random from the entire batch and it is found to be defective. We shall determine the probability that this item was \Produced by machine .
Let be the event that the selected item was \Produced by machine (), and let be the event that the selected item is defective. We must evaluate the conditional probability .
The probability that an item selected at random from the entire batch was \Produced by machine is as follows, for :
Furthermore, the probability that an item \Produced by machine will be defective is
It now follows from Bayes’ theorem that
Example 2.3.5: Identifying Genotypes¶
Consider a gene that has two alleles (see Example 1.6.4) and . Suppose that the gene exhibits itself through a trait (such as hair color or blood type) with two versions. We call dominant and recessive if individuals with genotypes and have the same version of the trait and the individuals with genotype have the other version. The two versions of the trait are called phenotypes. We shall call the phenotype exhibited by individuals with genotypes and the dominant trait, and the other trait will be called the recessive trait. In population genetics studies, it is common to have information on the phenotypes of individuals, but it is rather difficult to determine genotypes. However, some information about genotypes can be obtained by observing phenotypes of parents and children.
Assume that the allele is dominant, that individuals mate independently of genotype, and that the genotypes , , and occur in the population with probabilities , , and , respectively. We are going to observe an individual whose parents are not available, and we shall observe the phenotype of this individual. Let be the event that the observed individual has the dominant trait. We would like to revise our opinion of the possible genotypes of the parents. There are six possible genotype combinations, , for the parents prior to making any observations, and these are listed in tbl-2-2.
The probabilities of the were computed using the assumption that the parents mated independently of genotype. For example, occurs if the father is and the mother is (probability ) or if the father is and the mother is (probability ). The values of were computed assuming that the two available alleles are passed from parents to children with probability each and independently for the two parents. For example, given , the event occurs if and only if the child does not get two ’s. The probability of getting from both parents given is , so .
Now we shall compute and . We leave the other calculations to the reader. The denominator of Bayes’ theorem is the same for both calculations, namely,
Applying Bayes’ theorem, we get
Name of event | ||||||
probability of | ||||||
1 | 1 | 1 | 0 |
: Parental genotypes for Div {#tbl-2-2}
Note: Conditional Version of Bayes’ Theorem. There is also a version of Bayes’ theorem conditional on an event :
{#eq-2-3-2}
2.3.2 Prior and Posterior Probabilities¶
In Div, a probability like is often called the prior probability that the selected item will have been produced by machine , because is the probability of this event before the item is selected and before it is known whether the selected item is defective or nondefective. A probability like is then called the posterior probability that the selected item was produced by machine , because it is the probability of this event after it is known that the selected item is defective.
Thus, in Div, the prior probability that the selected item will have been produced by machine is 0.3. After an item has been selected and has been found to be defective, the posterior probability that the item was produced by machine is 0.26. Since this posterior probability is smaller than the prior probability that the item was produced by machine , the posterior probability that the item was produced by one of the other machines must be larger than the prior probability that it was produced by one of those machines (see Exercises Div and Div at the end of this section).
Computation of Posterior Probabilities in More Than One Stage¶
Suppose that a box contains one fair coin and one coin with a head on each side. Suppose also that one coin is selected at random and that when it is tossed, a head is obtained. We shall determine the probability that the coin is the fair coin.
Let be the event that the coin is fair, let be the event that the coin has two heads, and let be the event that a head is obtained when the coin is tossed. Then, by Bayes’ theorem,
$$
$$ {#eq-2-3-3}
Thus, after the first toss, the posterior probability that the coin is fair is .
Now suppose that the same coin is tossed again and we assume that the two tosses are conditionally independent given both and . Suppose that another head is obtained. There are two ways of determining the new value of the posterior probability that the coin is fair.
The first way is to return to the beginning of the experiment and assume again that the prior \Probabilities are . We shall let denote the event in which heads are obtained on two tosses of the coin, and we shall calculate the posterior probability that the coin is fair after we have observed the event . The assumption that the tosses are conditionally independent given means that . By Bayes’ theorem,
$$
$$ {#eq-2-3-4}
The second way of determining this same posterior probability is to use the conditional version of Bayes’ theorem eq-2-3-2 given the event . Given , the conditional probability of is , and the conditional probability of is therefore . These conditional probabilities can now serve as the prior probabilities for the next stage of the experiment, in which the coin is tossed a second time. Thus, we can apply eq-2-3-2 with , , and . We can then compute the posterior probability that the coin is fair after we have observed a head on the second toss and a head on the first toss. We shall need , which equals by thm-2-2-4 since and are conditionally independent given . Since the coin is two-headed when occurs, . So we obtain
$$
$$ {#eq-2-3-5}
The posterior probability of the event obtained in the second way is the same as that obtained in the first way. We can make the following general statement: If an experiment is carried out in more than one stage, then the posterior probability of every event can also be calculated in more than one stage. After each stage has been carried out, the posterior probability calculated for the event after that stage serves as the prior probability for the next stage. The reader should look back at eq-2-3-2 to see that this interpretation is precisely what the conditional version of Bayes’ theorem says. The example we have been doing with coin tossing is typical of many applications of Bayes’ theorem and its conditional version because we are assuming that the observable events are conditionally independent given each element of the partition (in this case, ). The conditional independence makes the probability of (head on th toss) given (or given ) the same whether or not we also condition on earlier tosses (see thm-2-2-4).
Conditionally Independent Events¶
The calculations that led to eq-2-3-3 and eq-2-3-5 together with exm-2-2-10 illustrate simple cases of a very powerful statistical model for observable events. It is very common to encounter a sequence of events that we believe are similar in that they all have the same probability of occurring. It is also common that the order in which the events are labeled does not affect the probabilities that we assign. However, we often believe that these events are not independent, because, if we were to observe some of them, we would change our minds about the probability of the ones we had not observed depending on how many of the observed events occur. For example, in the coin-tossing calculation leading up to eq-2-3-3, before any tosses occur, the probability of is the same as the probability of , namely, the denominator of eq-2-3-3, , as Theorem 2.1.4 says. However, after observing that the event occurs, the probability of is , which is the denominator of eq-2-3-5, , as computed by the conditional version of the law of total probability eq-2-1-5. Even though we might treat the coin tosses as independent conditional on the coin being fair, and we might treat them as independent conditional on the coin being two-headed (in which case we know what will happen every time anyway), we cannot treat them as independent without the conditioning information. The conditioning information removes an important source of uncertainty from the problem, so we partition the sample space accordingly. Now we can use the conditional independence of the tosses to calculate joint probabilities of various combinations of events conditionally on the partition events. Finally, we can combine these probabilities using Theorem 2.1.4 and eq-2-1-5. Two more examples will help to illustrate these ideas.
Example 2.3.6: Learning about a Proportion¶
In exm-2-2-10, a machine produced defective parts in one of two proportions, or . Suppose that the prior probability that is 0.9. After sampling six parts at random, suppose that we observe two defectives. What is the posterior probability that ?
Let and as in exm-2-2-10. Let be the event that two defectives occur in a random sample of size six. The prior probability of is 0.9, and the prior probability of is 0.1. We already computed and in exm-2-2-10. Bayes’ theorem tells us that
Even though we thought originally that had probability as high as 0.9, after we learned that there were two defective items in a sample as small as six, we changed our minds dramatically and now we believe that has probability as small as 0.04. The reason for this major change is that the event that occurred has much higher probability if is true than if is true.
Example 2.3.7: A Clinical Trial¶
Consider the same clinical trial described in Examples Example 2.1.12 and Example 2.1.13. Let be the event that the th patient has success as her outcome. Recall that is the event that for , where is the proportion of successes among all possible patients. If we knew which occurred, we would say that were independent. That is, we are willing to model the patients as conditionally independent given each event , and we set for all , . We shall still assume that for all prior to the start of the trial. We are now in position to express what we learn about by computing posterior probabilities for the events after each patient finishes the trial.
For example, consider the first patient. We calculated in eq-2-1-6. If occurs, we apply Bayes’ theorem to get
{#eq-2-3-6}
After observing one success, the posterior probabilities of large values of are higher than their prior probabilities and the posterior probabilities of low values of are lower than their prior probabilities as we would expect. For example, , because is ruled out after one success. Also, , which is much smaller than its prior value 0.0909, and , which is larger than its prior value 0.0909.
We could check how the posterior probabilities behave after each patient is observed. However, we shall skip ahead to the point at which all 40 patients in the imipramine column of tbl-2-1 have been observed. Let stand for the observed event that 22 of them are successes and 18 are failures. We can use the same reasoning as in exm-2-2-5 to compute . There are possible sequences of 40 patients with 22 successes, and, conditional on , the probability of each sequence is $([j − 1]/10)^{22}(1 − [j − 1]/10)^{18}.
So,
{#eq-2-3-7}
for each . Then Bayes’ theorem tells us that
Figure 2.3 shows the posterior probabilities of the 11 partition elements after observing . Notice that the probabilities of and are the highest, 0.42. This corresponds to the fact that the proportion of successes in the observed sample is , halfway between and .
We can also compute the probability that the next patient will be a success both before the trial and after the 40 patients. Before the trial, , which equals , as computed in eq-2-1-6. After observing the 40 patients, we can compute using the conditional version of the law of total probability, eq-2-1-5:
{#eq-2-3-8}
Using the values of in Figure 2.3 and the fact that (conditional independence of the given the ), we compute eq-2-3-8 to be 0.5476. This is also very close to the observed frequency of success.
The calculation at the end of Div is typical of what happens after observing many conditionally independent events with the same conditional probability of occurrence. The conditional probability of the next event given those that were observed tends to be close to the observed frequency of occurrence among the observed events. Indeed, when there is substantial data, the choice of prior probabilities becomes far less important.
Figure 2.3:The posterior probabilities of partition elements after 40 patients in Div
Figure 2.4:The posterior probabilities of partition elements after 40 patients in Div. The x characters mark the values of the posterior probabilities calculated in Div
Example 2.3.8: The Effect of Prior Probabilities¶
Consider the same clinical trial as in Div. This time, suppose that a different researcher has a different prior opinion about the value of , the probability of success. This researcher believes the following prior probabilities:
Event | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 | |
Prior prob. | 0.00 | 0.19 | 0.19 | 0.17 | 0.14 | 0.11 | 0.09 | 0.06 | 0.04 | 0.01 | 0.00 |
We can recalculate the posterior probabilities using Bayes’ theorem, and we get the values pictured in Figure 2.4. To aid comparison, the posterior probabilities from Div are also plotted in Figure 2.4 using the symbol X. One can see how close the two sets of posterior probabilities are despite the large differences between the prior probabilities. If there had been fewer patients observed, there would have been larger differences between the two sets of posterior probabilites because the observed events would have provided less information. (See Div in this section.)
2.3.3 Summary¶
Bayes’ theorem tells us how to compute the conditional probability of each event in a partition given an observed event . A major use of partitions is to divide the sample space into small enough pieces so that a collection of events of interest become conditionally independent given each event in the partition.
2.3.4 Exercises¶
Exercise 2.3.1¶
Suppose that events form a partition of the sample space . For , let denote the prior probability of . Also, for each event such that , let denote the posterior probability of given that the event has occurred. Prove that if , then for at least one value of ().
Exercise 2.3.2¶
Consider again the conditions of Div in this section, in which an item was selected at random from a batch of manufactured items and was found to be defective. For which values of () is the posterior probability that the item was produced by machine larger than the prior probability that the item was produced by machine ?
Exercise 2.3.3¶
Suppose that in Div in this section, the item selected at random from the entire lot is found to be nondefective. Determine the posterior probability that it was produced by machine .
Exercise 2.3.4¶
A new test has been devised for detecting a particular type of cancer. If the test is applied to a person who has this type of cancer, the probability that the person will have a positive reaction is 0.95 and the probability that the person will have a negative reaction is 0.05. If the test is applied to a person who does not have this type of cancer, the probability that the person will have a positive reaction is 0.05 and the probability that the person will have a negative reaction is 0.95. Suppose that in the general population, one person out of every 100,000 people has this type of cancer. If a person selected at random has a positive reaction to the test, what is the probability that he has this type of cancer?
Exercise 2.3.5¶
In a certain city, 30 percent of the people are Conservatives, 50 percent are Liberals, and 20 percent are Independents. Records show that in a particular election, 65 percent of the Conservatives voted, 82 percent of the Liberals voted, and 50 percent of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a Liberal?
Exercise 2.3.6¶
Suppose that when a machine is adjusted properly, 50 percent of the items produced by it are of high quality and the other 50 percent are of medium quality. Suppose, however, that the machine is improperly adjusted during 10 percent of the time and that, under these conditions, 25 percent of the items produced by it are of high quality and 75 percent are of medium quality.
a. Suppose that five items produced by the machine at a certain time are selected at random and inspected. If four of these items are of high quality and one item is of medium quality, what is the probability that the machine was adjusted properly at that time? b. Suppose that one additional item, which was produced by the machine at the same time as the other five items, is selected and found to be of medium quality. What is the new posterior probability that the machine was adjusted properly?
Exercise 2.3.7¶
Suppose that a box contains five coins and that for each coin there is a different probability that a head will be obtained when the coin is tossed. Let denote the probability of a head when the th coin is tossed (), and suppose that , , , , and .
a. Suppose that one coin is selected at random from the box and when it is tossed once, a head is obtained. What is the posterior probability that the th coin was selected ()? b. If the same coin were tossed again, what would be the probability of obtaining another head? c. If a tail had been obtained on the first toss of the selected coin and the same coin were tossed again, what would be the probability of obtaining a head on the second toss?
Exercise 2.3.8¶
Consider again the box containing the five different coins described in Div. Suppose that one coin is selected at random from the box and is tossed repeatedly until a head is obtained. a. If the first head is obtained on the fourth toss, what is the posterior probability that the th coin was selected ()? b. If we continue to toss the same coin until another head is obtained, what is the probability that exactly three additional tosses will be required?
Exercise 2.3.9¶
Consider again the conditions of Div in 2.1 The Definition of Conditional Probability. Suppose that several parts will be observed and that the different parts are conditionally independent given each of the three states of repair of the machine. If seven parts are observed and exactly one is defective, compute the posterior probabilities of the three states of repair.
Exercise 2.3.10¶
Consider again the conditions of Div, in which the phenotype of an individual was observed and found to be the dominant trait. For which values of () is the posterior probability that the parents have the genotypes of event smaller than the prior probability that the parents have the genotyes of event ?
Exercise 2.3.11¶
Suppose that in Div the observed individual has the recessive trait. Determine the posterior probability that the parents have the genotypes of event .
Exercise 2.3.12¶
In the clinical trial in Examples Div and Div, suppose that we have only observed the first five patients and three of the five had been successes. Use the two different sets of prior probabilities from Examples Div and Div to calculate two sets of posterior probabilities. Are these two sets of posterior probabilities as close to each other as were the two in Examples Div and Div? Why or why not?
Exercise 2.3.13¶
Suppose that a box contains one fair coin and one coin with a head on each side. Suppose that a coin is drawn at random from this box and that we begin to flip the coin. In eq-2-3-4 and eq-2-3-5, we computed the conditional probability that the coin was fair given that the first two flips both produce heads.
a. Suppose that the coin is flipped a third time and another head is obtained. Compute the probability that the coin is fair given that all three flips produced heads. b. Suppose that the coin is flipped a fourth time and the result is tails. Compute the posterior probability that the coin is fair.
Exercise 2.3.14¶
Consider again the conditions of Div in 2.2 Independent Events. Assume that . Let be the event that exactly 8 out of 11 programs compiled. Compute the conditional probability of given .
Exercise 2.3.15¶
Use the prior probabilities in Div for the events . Let be the event that the first patient is a success. Compute the probability of and explain why it is so much less than the value computed in Div.
Exercise 2.3.16¶
Consider a machine that produces items in sequence. Under normal operating conditions, the items are independent with probability 0.01 of being defective. However, it is possible for the machine to develop a “memory” in the following sense: After each defective item, and independent of anything that happened earlier, the probability that the next item is defective is . After each nondefective item, and independent of anything that happened earlier, the probability that the next item is defective is .
Assume that the machine is either operating normally for the whole time we observe or has a memory for the whole time that we observe. Let be the event that the machine is operating normally, and assume that . Let be the event that the th item inspected is defective. Assume that is independent of .
a. Prove that for all . Hint: Use induction. b. Assume that we observe the first six items and the event that occurs is . That is, the third and fourth items are defective, but the other four are not. Compute .