1.5 The Definition of Probability - Probability and Statistics

Overview¶

We begin with the mathematical definition of probability and then present some useful results that follow easily from the definition.

1.5.1 Axioms and Basic Theorems¶

In this section, we shall present the mathematical, or axiomatic, definition of probability. In a given experiment, it is necessary to assign to each event $A$ in the sample space $S$ a number $\Pr(A)$ that indicates the probability that $A$ will occur. In order to satisfy the mathematical definition of probability, the number $\Pr(A)$ that is assigned must satisfy three specific axioms. These axioms ensure that the number $\Pr(A)$ will have certain properties that we intuitively expect a probability to have under each of the various interpretations described in 1.2 Interpretations of Probability.

The first axiom states that the probability of every event must be nonnegative.

The second axiom states that if an event is certain to occur, then the probability of that event is 1.

Before stating Axiom 3, we shall discuss the probabilities of disjoint events. If two events are disjoint, it is natural to assume that the probability that one or the other will occur is the sum of their individual probabilities. In fact, it will be assumed that this additive property of probability is also true for every finite collection of disjoint events and even for every infinite sequence of disjoint events. If we assume that this additive property is true only for a finite number of disjoint events, we cannot then be certain that the property will be true for an infinite sequence of disjoint events as well. However, if we assume that the additive property is true for every infinite sequence of disjoint events, then (as we shall prove) the property must also be true for every finite number of disjoint events. These considerations lead to the third axiom.

We are now prepared to give the mathematical definition of probability.

We shall now derive two important consequences of Axiom 3. First, we shall show that if an event is impossible, its probability must be 0.

Theorem 1.5.1

Pr(\varnothing) = 0.

(1)

Consider the infinite sequence of events $A_1, A_2, \ldots$ such that $A_i = \varnothing$ for $i = 1, 2, \ldots$ . In other words, each of the events in the sequence is just the empty set $\varnothing$ . Then this sequence is a sequence of disjoint events, since $\varnothing \cap \varnothing = \varnothing$ . Furthermore, $\bigcup_{i=1}^{\infty}A_i = \varnothing$ . Therefore, it follows from Axiom 3 that

\Pr(\varnothing) = \Pr\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}\Pr(A_i) = \sum_{i=1}^{\infty}\Pr(\varnothing).

This equation states that when the number $\Pr(\varnothing)$ is added repeatedly in an infinite series, the sum of that series is simply the number $\Pr(\varnothing)$ . The only real number with this property is zero.

We can now show that the additive property assumed in Axiom 3 for an infinite sequence of disjoint events is also true for every finite number of disjoint events.

Theorem 1.5.2

For every finite sequence of $n$ disjoint events $A_1, \ldots, A_n$ ,

\Pr\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n \Pr(A_i).

Consider the infinite sequence of events $A_1, A_2, \ldots$ , in which $A_1, \ldots, A_n$ are the $n$ given disjoint events and $A_i = \varnothing$ for $i > n$ . Then the events in this infinite sequence are disjoint and $\bigcup_{i=1}^{\infty}A_i = \bigcup_{i=1}^n A_i$ . Therefore, by Axiom 3,

\begin{align*} \Pr\left(\bigcup_{i=1}^nA_i\right) &= \Pr\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}\Pr(A_i) \\ &= \sum_{i=1}^n\Pr(A_i) + \sum_{i=n+1}^{\infty}\Pr(A_i) \\ &= \sum_{i=1}^n\Pr(A_i) + 0 \\ &= \sum_{i=1}^n\Pr(A_i). \end{align*}

1.5.2 Further Properties of Probability¶

From the axioms and theorems just given, we shall now derive four other general properties of probability measures. Because of the fundamental nature of these four properties, they will be presented in the form of four theorems, each one of which is easily proved.

B = A \cup (B \cap A^c) in the proof of . — Figure 1.8: $B = A \cup (B \cap A^c)$ in the proof of Theorem 1.5.4.

Example 1.5.3 (Diagnosing Diseases)

A patient arrives at a doctor’s office with a sore throat and lowgrade fever. After an exam, the doctor decides that the patient has either a bacterial infection or a viral infection or both. The doctor decides that there is a probability of 0.7 that the patient has a bacterial infection and a probability of 0.4 that the person has a viral infection. What is the probability that the patient has both infections?

Let $B$ be the event that the patient has a bacterial infection, and let $V$ be the event that the patient has a viral infection. We are told that $\Pr(B) = 0.7$ , that $\Pr(V) = 0.4$ , and that $S = B \cup V$ . We are asked to find $\Pr(B \cap V)$ . We will use Theorem 1.5.7, which says that

\Pr(B \cup V) = \Pr(B) + \Pr(V) − \Pr(B \cap V).

(7)

{#eq-1-5-2}

Since $S = B \cup V$ , the left-hand side of eq-1-5-2 is 1, while the first two terms on the right-hand side are 0.7 and 0.4. The result is

1 = 0.7 + 0.4 − \Pr(B \cap V),

(8)

which leads to $\Pr(B \cap V) = 0.1$ , the probability that the patient has both infections.

Example 1.5.4 (Demands for Utilities)

Consider, once again, the contractor who needs to plan for water and electricity demands in Example 1.4.5. There are many possible choices for how to spread the probability around the sample space (pictured in Figure 1.5). One simple choice is to make the probability of an event $E$ proportional to the area of $E$ . The area of $S$ (the sample space) is $(150 − 1)\cdot (200 − 4) = 29,204$ , so $\Pr(E)$ equals the area of $E$ divided by 29,204. For example, suppose that the contractor is interested in high demand. Let $A$ be the set where water demand is at least 100, and let $B$ be the event that electric demand is at least 115, and suppose that these values are considered high demand. These events are shaded with different patterns in Figure 1.9. The area of $A$ is $(150 − 1) \cdot (200 − 100) = 14,900$ , and the area of $B$ is $(150 − 115) \cdot (200 − 4) = 6,860$ . So,

\Pr(A) = \frac{14,900}{29,204} = 0.5102, \; \Pr(B) = \frac{6,860}{29,204} = 0.2349.

(9)

The two events intersect in the region denoted by $A \cap B$ . The area of this region is $(150 − 115) \cdot (200 − 100) = 3,500$ , so $\Pr(A \cap B) = 3,500/29,204 = 0.1198$ . If the contractor wishes to compute the probability that at least one of the two demands will be high, that probability is

\Pr(A \cup B) = \Pr(A) + \Pr(B) − \Pr(A \cap B) = 0.5102 + 0.2349 − 0.1198 = 0.6253,

(10)

according to Theorem 1.5.7.

Figure 1.9:The two events of interest in utility demand sample space for Example 1.5.4

The proof of the following useful result is left to Exercise 1.5.13.

Note: Probability Zero Does Not Mean Impossible. When an event has probability 0, it does not mean that the event is impossible. In Example 1.5.4, there are many events with 0 probability, but they are not all impossible. For example, for every $x$ , the event that water demand equals $x$ corresponds to a line segment in Figure 1.5. Since line segments have 0 area, the probability of every such line segment is 0, but the events are not all impossible. Indeed, if every event of the form $\{\text{water demand equals }x\}$ were impossible, then water demand could not take any value at all. If $\epsilon > 0$ , the event

\{\text{water demand is between }x − \epsilon\text{ and }x + \epsilon\}

will have positive probability, but that probability will go to 0 as $\epsilon$ goes to 0.

1.5.3 Summary¶

We have presented the mathematical definition of probability through the three axioms. The axioms require that every event have nonnegative probability, that the whole sample space have probability 1, and that the union of an infinite sequence of disjoint events have probability equal to the sum of their probabilities. Some important results to remember include the following:

If $A_1, \ldots, A_k$ are disjoint, $\Pr\left(\bigcup_{i=1}^kA_i\right) = \sum_{i=1}^k\Pr(A_i)$ .
$\Pr(A^c) = 1 - \Pr(A)$ .
$A \subset B$ implies that $\Pr(A) \leq \Pr(B)$ .
$\Pr(A \cup B) = \Pr(A) + \Pr(B) − \Pr(A \cap B)$ .

It does not matter how the probabilities were determined. As long as they satisfy the three axioms, they must also satisfy the above relations as well as all of the results that we prove later in the text.

1.5.4 Exercises¶

Exercise 1.5.12

Exercise 1.5.12¶

Let $A_1, A_2, \ldots$ be an arbitrary infinite sequence of events, and let $B_1, B_2, \ldots$ be another infinite sequence of events defined as follows: $B_1 = A_1$ , $B_2 = A_1^c \cap A_2$ , $B_3 = A_1^c \cap A_2^c \cap A_3$ , $B_4 = A_1^c \cap A_2^c \cap A_3^c \cap A_4$ , $\ldots$ . Prove that

\Pr\left(\bigcup_{i=1}^nA_i\right) = \sum_{i=1}^n\Pr(B_i) \text{ for }n = 1, 2, \ldots,

(12)

and that

\Pr\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}\Pr(B_i).

(13)

Probability and Statistics

1.4 Set Theory

Probability and Statistics

1.6 Finite Sample Spaces