Skip to article frontmatterSkip to article content

3.3 The Cumulative Distribution Function

Overview

Although a discrete distribution is characterized by its PMF and a continuous distribution is characterized by its PDF, every distribution has a common characterization through its (cumulative) distribution function (CDF). The inverse of the CDF is called the quantile function, and it is useful for indicating where the probability is located in a distribution.

3.3.1 Definition and Basic Properties

It should be emphasized that the cumulative distribution function is defined as above for every random variable XX, regardless of whether the distribution of XX is discrete, continuous, or mixed. For the continuous random variable in Example 3.3.1, the CDF was calculated in eq-3-3-1. Here is a discrete example:

We shall soon see (Theorem 3.3.2) that the CDF allows calculation of all interval probabilities; hence, it characterizes the distribution of a random variable. It follows from eq-3-3-2 that the CDF of each random variable XX is a function FF defined on the real line. The value of FF at every point xx must be a number F(x)F(x) in the interval [0,1][0, 1] because F(x)F(x) is the probability of the event {Xx}\{X \leq x\}. Furthermore, it follows from eq-3-3-2 that the CDF of every random variable XX must have the following three properties.

An example of a CDF is sketched in Figure 3.6. It is shown in that figure that 0F(x)10 \leq F(x) \leq 1 over the entire real line. Also, F(x)F(x) is always nondecreasing as xx increases, although F(x)F(x) is constant over the interval x1xx2x_1 \leq x \leq x_2 and for xx4x \geq x_4.

An example of a CDF

Figure 3.6:An example of a CDF

The limiting values specified in Property 3.3.2 are indicated in Figure 3.6. In this figure, the value of F(x)F(x) actually becomes 1 at x=x4x = x_4 and then remains 1 for x>x4x > x_4. Hence, it may be concluded that Pr(Xx4)=1\Pr(X \leq x_4) = 1 and Pr(X>x4)=0\Pr(X > x_4) = 0. On the other hand, according to the sketch in Figure 3.6, the value of F(x)F(x) approaches 0 as xx \rightarrow −\infty, but does not actually become 0 at any finite point xx. Therefore, for every finite value of xx, no matter how small, Pr(Xx)>0\Pr(X \leq x) > 0.

A CDF need not be continuous. In fact, the value of F(x)F(x) may jump at any finite or countable number of points. In Figure 3.6, for instance, such jumps or points of discontinuity occur where x=x1x = x_1 and x=x3x = x_3. For each fixed value xx, we shall let F(x)F(x^-) denote the limit of the values of F(y)F(y) as yy approaches xx from the left, that is, as yy approaches xx through values smaller than xx. In symbols,

F(x)=limyx, y<xF(y).F(x^{-}) = \lim_{y \rightarrow x,~y < x}F(y).

Similarly, we shall define F(x+)F(x^+) as the limit of the values of F(y)F(y) as yy approaches xx from the right. Thus,

F(x+)=limyx, y>xF(y).F(x^+) = \lim_{y \rightarrow x,~y > x}F(y).

If the CDF is continuous at a given point xx, then F(x)=F(x+)=F(x)F(x^{-}) = F(x^+) = F(x) at that point.

It follows from Property 3.3.3 that at every point xx at which a jump occurs,

F(x+)=F(x)   and   F(x)<F(x).F(x^+) = F(x) \; \text{ and } \; F(x^-) < F(x).

In Figure 3.6 this property is illustrated by the fact that, at the points of discontinuity x=x1x = x_1 and x=x3x = x_3, the value of F(x1)F(x_1) is taken as z1z_1 and the value of F(x3)F(x_3) is taken as z3z_3.

Determining Probabilities from the Distribution Function

The type of reasoning used in Example 3.3.3 can be extended to find the probability that an arbitrary random variable XX will lie in any specified interval of the real line from the CDF. We shall derive this probability for four different types of intervals.

For example, if the CDF of XX is as sketched in Figure 3.6, then it follows from Theorems Theorem 3.3.1 and Theorem 3.3.2 that Pr(X>x2)=1z1\Pr(X > x_2) = 1 − z_1 and Pr(x2<Xx3)=z3z1\Pr(x_2 < X \leq x_3) = z_3 − z_1. Also, since F(x)F(x) is constant over the interval x1xx2x_1 \leq x \leq x_2, then Pr(x1<Xx2)=0\Pr(x_1 < X \leq x_2) = 0.

It is important to distinguish carefully between the strict inequalities and the weak inequalities that appear in all of the preceding relations and also in the next theorem. If there is a jump in F(x)F(x) at a given value xx, then the values of Pr(Xx)\Pr(X \leq x) and Pr(X<x)\Pr(X < x) will be different.

For example, for the CDF sketched in Figure 3.6, Pr(X<x3)=z2\Pr(X < x_3) = z_2 and Pr(X<x4)=1\Pr(X < x_4) = 1.

Finally, we shall show that for every value xx, Pr(X=x)\Pr(X = x) is equal to the amount of the jump that occurs in FF at the point xx. If FF is continuous at the point xx, that is, if there is no jump in FF at xx, then Pr(X=x)=0\Pr(X = x) = 0.

In Figure 3.6, for example, Pr(X=x1)=z1z0\Pr(X = x_1) = z_1 − z_0, Pr(X=x3)=z3z2\Pr(X = x_3) = z_3 − z_2, and the probability of every other individual value of XX is 0.

The CDF of a Discrete Distribution

From the definition and properties of a CDF F(x)F(x), it follows that if a<ba < b and if Pr(a<X<b)=0\Pr(a < X < b) = 0, then F(x)F(x) will be constant and horizontal over the interval a<x<ba < x < b. Furthermore, as we have just seen, at every point xx such that Pr(X=x)>0\Pr(X = x) > 0, the CDF will jump by the amount Pr(X=x)\Pr(X = x).

Suppose that XX has a discrete distribution with the pmf f(x)f(x). Together, the properties of a CDF imply that F(x)F(x) must have the following form: F(x)F(x) will have a jump of magnitude f(xi)f(x_i) at each possible value xix_i of XX, and F(x)F(x) will be constant between every pair of successive jumps. The distribution of a discrete random variable XX can be represented equally well by either the pmf or the CDF of XX.

The CDF of a Continuous Distribution

Thus, the CDF of a continuous random variable XX can be obtained from the pdf and vice versa. eq-3-3-7 is how we found the CDF in Example 3.3.1. Notice that the derivative of the FF in Example 3.3.1 is

F(x)={0for x<0,1(1+x)2for x>0,F'(x) = \begin{cases} 0 &\text{for }x < 0, \\ \frac{1}{(1+x)^2} &\text{for }x > 0, \end{cases}

and FF' does not exist at x=0x = 0. This verifies eq-3-3-8 for Example 3.3.1. Here, we have used the popular shorthand notation F(x)F'(x) for the derivative of FF at the point xx.

The Quantile Function

The value x0x_0 that we seek in Example 3.3.5 is called the 0.5 quantile of XX or the 50th percentile of XX because 50% of the distribution of XX is at or below x0x_0.

The notation F1(p)F^{-1}(p) in Definition 3.3.2 deserves some justification. Suppose first that the CDF FF of XX is continuous and one-to-one over the whole set of possible values of XX. Then the inverse F1F^{-1} of FF exists, and for each 0<p<10 < p < 1, there is one and only one xx such that F(x)=pF(x) = p. That xx is F1(p)F^{−1}(p). Definition 3.3.2 extends the concept of inverse function to nondecreasing functions (such as CDFs) that may be neither one-to-one nor continuous.

Quantiles of Continuous Distributions: When the CDF of a random variable XX is continuous and one-to-one over the whole set of possible values of XX, the inverse F1F^{-1} of FF exists and equals the quantile function of XX.

The pdf of the change in value of a portfolio with lower 1% indicated

Figure 3.7:The pdf of the change in value of a portfolio with lower 1% indicated

The CDF of a uniform distribution indicating how to solve for a quantile

Figure 3.8:The CDF of a uniform distribution indicating how to solve for a quantile

Note: Quantiles, Like CDFs, Depend on the Distribution Only: Any two random variables with the same distribution have the same quantile function. When we refer to a quantile of XX, we mean a quantile of the distribution of XX.

Quantiles of Discrete Distributions: It is convenient to be able to calculate quantiles for discrete distributions as well. The quantile function of Definition 3.3.2 exists for all distributions whether discrete, continuous, or otherwise. For example, in Figure 3.6, let z0pz1z_0 \leq p \leq z_1. Then the smallest xx such that F(x)pF(x) \geq p is x1x_1. For every value of x<x1x < x_1, we have F(x)<z0pF(x) < z_0 \leq p and F(x1)=z1F(x_1) = z_1. Notice that F(x)=z1F(x) = z_1 for all xx between x1x_1 and x2x_2, but since x1x_1 is the smallest of all those numbers, x1x_1 is the pp quantile. Because distribution functions are continuous from the right, the smallest xx such that F(x)pF(x) \geq p exists for all 0<p<10 < p < 1. For p=1p = 1, there is no guarantee that such an xx will exist. For example, in Figure 3.6, F(x4)=1F(x_4) = 1, but in Example 3.3.1, F(x)<1F(x) < 1 for all xx. For p=0p = 0, there is never a smallest xx such that F(x)=0F(x) = 0 because limxF(x)=0\lim_{x \rightarrow −\infty}F(x) = 0. That is, if F(x0)=0F(x_0) = 0, then F(x)=0F(x) = 0 for all x<x0x < x_0. For these reasons, we never talk about the 0 or 1 quantiles.

ppF1(p)F^{-1}(p)
(0,0.1681](0, 0.1681]0
(0.1681,0.5283](0.1681, 0.5283]1
(0.5283,0.8370](0.5283, 0.8370]2
(0.8370,0.9693](0.8370, 0.9693]3
(0.9693,0.9977](0.9693, 0.9977]4
(0.9977,1)(0.9977, 1)5

: Table 3.1: Quantile function for Example 3.3.9. {#tbl-3-1}

Certain quantiles have special names.

Note: The Median Is Special

The median of a distribution is one of several special features that people like to use when sumarizing the distribution of a random variable. We shall discuss summaries of distributions in more detail in Chapter 4: Expectation. Because the median is such a popular summary, we need to note that there are several different but similar “definitions” of median. Recall that the 1/21/2 quantile is the smallest number xx such that F(x)1/2F(x) \geq 1/2. For some distributions, usually discrete distributions, there will be an interval of numbers [x1,x2)[x_1, x_2) such that for all x[x1,x2)x \in [x_1, x_2), F(x)=1/2F(x) = 1/2. In such cases, it is common to refer to all such xx (including x2x_2) as medians of the distribution. (See Definition 4.5.1.) Another popular convention is to call (x1+x2)/2(x_1 + x_2)/2 the median. This last is probably the most common convention. The readers should be aware that, whenever they encounter a median, it might be any one of the things that we just discussed. Fortunately, they all mean nearly the same thing, namely that the number divides the distribution in half as closely as is possible.

One advantage to describing a distribution by the quantile function rather than by the CDF is that quantile functions are easier to display in tabular form for multiple distributions. The reason is that the domain of the quantile function is always the interval (0,1)(0, 1) no matter what the possible values of XX are. Quantiles are also useful for summarizing distributions in terms of where the probability is. For example, if one wishes to say where the middle half of a distribution is, one can say that it lies between the 0.25 quantile and the 0.75 quantile. In sec-8-5, we shall see how to use quantiles to help provide estimates of unknown quantities after observing data.

In Div, you can show how to recover the CDF from the quantile function. Hence, the quantile function is an alternative way to characterize a distribution.

Summary

The CDF FF of a random variable XX is F(x)=Pr(Xx)F(x) = \Pr(X \leq x) for all real xx. This function is continuous from the right. If we let F(x)F(x^-) equal the limit of F(y)F(y) as yy approaches xx from below, then F(x)F(x)=Pr(X=x)F(x) − F(x^-) = Pr(X = x). A continuous distribution has a continuous CDF and F(x)=f(x)F'(x) = f(x), the pdf of the distribution, for all xx at which FF is differentiable. A discrete distribution has a CDF that is constant between the possible values and jumps by f(x)f(x) at each possible value xx. The quantile function F1(p)F^{-1}(p) is equal to the smallest xx such that F(x)pF(x) \geq p for 0<p<10 < p < 1.

Exercises

Exercise 3.3.1

Suppose that a random variable XX has the Bernoulli distribution with parameter p=0.7p = 0.7. (See Definition 3.1.5) Sketch the CDF of XX.

Exercise 3.3.2

Suppose that a random variable XX can take only the values 2−2, 0, 1, and 4, and that the probabilities of these values are as follows: Pr(X=2)=0.4\Pr(X = −2) = 0.4, Pr(X=0)=0.1\Pr(X = 0) = 0.1, Pr(X=1)=0.3Pr(X = 1) = 0.3, and Pr(X=4)=0.2\Pr(X = 4) = 0.2. Sketch the CDF of XX.

Exercise 3.3.3

Suppose that a coin is tossed repeatedly until a head is obtained for the first time, and let XX denote the number of tosses that are required. Sketch the CDF of XX.

Exercise 3.3.4

Suppose that the CDF FF of a random variable XX is as sketched in Figure 3.9. Find each of the following probabilities:

  • a. Pr(X=1)\Pr(X = −1)

  • b. Pr(X<0)\Pr(X < 0)

  • c. Pr(X0)\Pr(X \leq 0)

  • d. Pr(X=1)\Pr(X = 1)

  • e. Pr(0<X3)\Pr(0 < X \leq 3)

  • f. Pr(0<X<3)\Pr(0 < X < 3)

  • g. Pr(0X3)\Pr(0 \leq X \leq 3)

  • h. Pr(1<X2)\Pr(1 < X \leq 2)

  • i. Pr(1X2)\Pr(1 \leq X \leq 2)

  • j. Pr(X>5)\Pr(X > 5)

  • k. Pr(X5)Pr(X \geq 5)

  • l. Pr(3X4)\Pr(3 \leq X \leq 4)

The CDF for Exercise 4

Figure 3.9:The CDF for Exercise 4

Exercise 3.3.5

Suppose that the CDF of a random variable XX is as follows:

F(x)={0for x0,19x2for 0<x3,1for x>3.F(x) = \begin{cases} 0 &\text{for }x \leq 0, \\ \frac{1}{9}x^2 &\text{for }0 < x \leq 3, \\ 1 &\text{for }x > 3. \end{cases}

Find and sketch the pdf of XX.

Exercise 3.3.6

Suppose that the CDF of a random variable XX is as follows:

F(x)={ex3for x3,1for x>3.F(x) = \begin{cases} e^{x - 3} &\text{for }x \leq 3, \\ 1 &\text{for }x > 3. \end{cases}

Find and sketch the pdf of XX.

Exercise 3.3.7

Suppose, as in Div, that a random variable XX has the uniform distribution on the interval [2,8][−2, 8]. Find and sketch the CDF of XX.

Exercise 3.3.8

Suppose that a point in the xyxy-plane is chosen at random from the interior of a circle for which the equation is x2+y2=1x^2 + y^2 = 1; and suppose that the probability that the point will belong to each region inside the circle is proportional to the area of that region. Let ZZ denote a random variable representing the distance from the center of the circle to the point. Find and sketch the CDF of ZZ.

Exercise 3.3.9

Suppose that XX has the uniform distribution on the interval [0,5][0, 5] and that the random variable YY is defined by Y=0Y = 0 if X1X \leq 1, Y=5Y = 5 if X3X \geq 3, and Y=XY = X otherwise. Sketch the CDF of YY.

Exercise 3.3.10

For the CDF in Example 3.3.4, find the quantile function.

Exercise 3.3.11

For the CDF in Div, find the quantile function.

Exercise 3.3.12

For the CDF in Div, find the quantile function.

Exercise 3.3.13

Suppose that a broker believes that the change in value XX of a particular investment over the next two months has the uniform distribution on the interval [12,24][−12,24]. Find the value at risk VaR for two months at probability level 0.95.

Exercise 3.3.14

Find the quartiles and the median of the binomial distribution with parameters n=10n = 10 and p=0.2p = 0.2.

Exercise 3.3.15

Suppose that XX has the pdf

f(x)={2xif 0<x<1,0otherwise.f(x) = \begin{cases} 2x &\text{if }0 < x < 1, \\ 0 &\text{otherwise.} \end{cases}

Find and sketch the CDF of XX.

Exercise 3.3.16

Find the quantile function for the distribution in Example 3.3.1.

Exercise 3.3.17

Prove that the quantile function F^{-1} of a general random variable XX has the following three properties that are analogous to properties of the CDF:

a. F1F^{-1} is a nondecreasing function of pp for 0<p<10 < p < 1. b. Let x0=limp0, p>0F1(p)x_0 = \lim_{p \rightarrow 0,~p > 0}F^{-1}(p) and x1=limp1, p<1F1(p)x_1 = \lim_{p \rightarrow 1,~p < 1}F^{-1}(p). Then x0x_0 equals the greatest lower bound on the set of numbers cc such that Pr(Xc)>0\Pr(X \leq c) > 0, and x1x_1 equals the least upper bound on the set of numbers dd such that Pr(Xd)>0\Pr(X \geq d) > 0. c. F1F^{−1} is continuous from the left; that is F1(p)=F1(p)F^{-1}(p) = F^{-1}(p^-) for all 0<p<10 < p < 1.

Exercise 3.3.18

Let XX be a random variable with quantile function F1F^{-1}. Assume the following three conditions: (i) F1(p)=cF^{-1}(p) = c for all pp in the interval (p0,p1)(p_0, p_1), (ii) either p0=0p_0 = 0 or F1(p0)<cF^{-1}(p_0) < c, and (iii) either p1=1p_1 = 1 or F1(p)>cF^{-1}(p) > c for p>p1p > p_1. Prove that Pr(X=c)=p1p0\Pr(X = c) = p_1 − p_0.

Exercise 3.3.19

Let XX be a random variable with CDF FF and quantile function F1F^{-1}. Let x0x_0 and x1x_1 be as defined in Div. (Note that x0=and/orx_0 = -\infty and/or x_1 = \inftyarepossible.)Provethatforall are possible.) Prove that for all xintheopeninterval in the open interval (x_0, x_1),, F(x)isthelargest is the largest psuchthat such that F^{−1}(p) \leq x$.

Exercise 3.3.20

In Div, draw a sketch of the CDF FF of XX and find F(10)F(10).