Skip to article frontmatterSkip to article content

3.10 Markov Chains

Overview

A popular model for systems that change over time in a random manner is the Markov chain model. A Markov chain is a sequence of random variables, one for each time. At each time, the corresponding random variable gives the state of the system. Also, the conditional distribution of each future state given the past states and the present state depends only on the present state.

3.10.1 Stochastic Processes

Definition 3.10.1: Stochastic Process

A sequence of random variables X1,X2,X_1, X_2, \ldots is called a stochastic process or random process with discrete time parameter. The first random variable X1X_1 is called the initial state of the process; and for n=2,3,n = 2, 3, \ldots, the random variable XnX_n is called the state of the process at time nn.

In Example 3.10.1, the state of the process at any time is the number of lines being used at that time. Therefore, each state must be an integer between 0 and 5. Each of the random variables in a stochastic process has a marginal distribution, and the entire process has a joint distribution. For convenience, in this text, we will discuss only joint distributions for finitely many of X1,X2,X_1, X_2, \ldots at a time. The meaning of the phrase “discrete time parameter” is that the process, such as the numbers of occupied phone lines, is observed only at discrete or separated points in time, rather than continuously in time. In 5.4 The Poisson Distributions, we will introduce a different stochastic process (called the Poisson process) with a continuous time parameter.

In a stochastic process with a discrete time parameter, the state of the process varies in a random manner from time to time. To describe a complete probability model for a particular process, it is necessary to specify the distribution for the initial state X1X_1 and also to specify for each n=1,2,n = 1, 2, \ldots the conditional distribution of the subsequent state Xn+1X_{n+1} given X1,,XnX_1, \ldots, X_n. These conditional distributions are equivalent to the collection of conditional CDFs of the following form:

Pr(Xn+1bX1=x1,X2=x2,,Xn=xn).\Pr(X_{n+1} \leq b \mid X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n).

Markov Chains

A Markov chain is a special type of stochastic process, defined in terms of the conditional distributions of future states given the present and past states.

Definition 3.10.2: Markov Chain

A stochastic process with discrete time parameter is a Markov chain if, for each time nn, the conditional distributions of all Xn+jX_{n+j} for j1j \geq 1 given X1,,XnX_1, \ldots, X_n depend only on XnX_n and not on the earlier states X1,Xn1X_1, \ldots X_{n-1}. In symbols, for n=1,2,n = 1, 2, \ldots and for each bb and each possible sequence of states x1,x2,,xnx_1, x_2, \ldots, x_n,

Pr(Xn+1bX1=x1,X2=x2,,Xn=xn)=Pr(Xn+1bXn=xn).Pr(X_{n+1} \leq b \mid X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n) = Pr(X_{n+1} \leq b \mid X_n = x_n).

A Markov chain is called finite if there are only finitely many possible states.

In the remainder of this section, we shall consider only finite Markov chains. This assumption could be relaxed at the cost of more complicated theory and calculation. For convenience, we shall reserve the symbol kk to stand for the number of possible states of a general finite Markov chain for the remainder of the section. It will also be convenient, when discussing a general finite Markov chain, to name the kk states using the integers 1,,k1, \ldots, k. That is, for each nn and jj, Xn=jX_n = j will mean that the chain is in state jj at time nn. In specific examples, it may prove more convenient to label the states in a more informative fashion. For example, if the states are the numbers of phone lines in use at given times (as in the example that introduced this section), we would label the states 0,,50, \ldots, 5 even though k=6k = 6.

The following result follows from the multiplication rule for conditional probabilities, Theorem 2.1.2.

Theorem 3.10.1

For a finite Markov chain, the joint pmf for the first nn states equals

$$

Pr(X1=x1,X2=x2,,Xn=xn)=Pr(X1=x1)Pr(X2=x2X1=x1)Pr(X3=x3X2=x2)Pr(Xn=xnXn1=xn1).\begin{align*} &\Pr(X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n) \\ = &\Pr(X_1 = x_1)\Pr(X_2 = x_2 \mid X_1 = x_1)\Pr(X_3 = x_3 \mid X_2 = x_2) \cdots \\ &\Pr(X_n = x_n \mid X_{n-1} = x_{n-1}). \end{align*}

$$ {#eq-3-10-1}

Also, for each nn and each m>0m > 0,

$$

Pr(Xn+1=xn+1,Xn+2=xn+2,,Xn+m=xn+mXn=xn)=Pr(Xn+1=xn+1Xn=xn)Pr(Xn+2=xn+2Xn+1=xn+1)Pr(Xn+m=xn+mXn+m1=xn+m1).\begin{align*} &\Pr(X_{n+1} = x_{n+1}, X_{n+2} = x_{n+2}, \ldots, X_{n+m} = x_{n+m} \mid X_n = x_n) \\ = &\Pr(X_{n+1} = x_{n+1} \mid X_n = x_n)\Pr(X_{n+2} = x_{n+2} \mid X_{n+1} = x_{n+1}) \\ &\Pr(X_{n+m} = x_{n + m} \mid X_{n + m - 1} = x_{n + m - 1}). \end{align*}

$$ {#eq-3-10-2}

eq-3-10-1 is a discrete version of a generalization of conditioning in sequence that was illustrated in exm-3-7-18 with continuous random variables. eq-3-10-2 is a conditional version of eq-3-10-1 shifted forward in time.

Example 3.10.2: Shopping for Toothpaste

In Div, we considered a shopper who chooses between two brands of toothpaste on several occasions. Let Xi=1X_i = 1 if the shopper chooses brand AA on the iith purchase, and let Xi=2X_i = 2 if the shopper chooses brand BB on the iith purchase. Then the sequence of states X1,X2,X_1, X_2, \ldots is a stochastic process with two possible states at each time. The probabilities of purchase were specified by saying that the shopper will choose the same brand as on the previous purchase with probability 1/31/3 and will switch with probability 2/32/3. Since this happens regardless of purchases that are older than the previous one, we see that this stochastic process is a Markov chain with

Pr(Xn+1=1Xn=1)=13,  Pr(Xn+1=2Xn=1)=23,Pr(Xn+1=1Xn=2)=23,  Pr(Xn+1=2Xn=2)=13.\begin{align*} Pr(X_{n+1} = 1 \mid X_n = 1) &= \frac{1}{3}, \; \Pr(X_{n+1} = 2 \mid X_n = 1) = \frac{2}{3}, \\ \Pr(X_{n+1} = 1 \mid X_n = 2) &= \frac{2}{3}, \; \Pr(X_{n+1} = 2 \mid X_n = 2) = \frac{1}{3}. \end{align*}

Div has an additional feature that puts it in a special class of Markov chains. The probability of moving from one state at time nn to another state at time n+1n + 1 does not depend on nn.

Definition 3.10.3: Transition Distributions/Stationary Transition Distributions

Consider a finite Markov chain with kk possible states. The conditional distributions of the state at time n+1n + 1 given the state at time nn, that is, Pr(Xn+1=jXn=i)\Pr(X_{n+1} = j \mid X_n = i) for i,j=1,,ki, j = 1, \ldots, k and n=1,2,n = 1, 2, \ldots, are called the transition distributions of the Markov chain. If the transition distribution is the same for every time nn (n=1,2,n = 1, 2, \ldots), then the Markov chain has stationary transition distributions.

When a Markov chain with kk possible states has stationary transition distributions, there exist probabilities pijp_{ij} for i,j=1,,ki, j = 1, \ldots, k such that, for all nn,

Pr(Xn+1=jXn=i)=pij   for n=1,2,\Pr(X_{n+1} = j \mid X_n = i) = p_{ij} \; \text{ for }n = 1, 2, \ldots

{#eq-3-10-3}

The Markov chain in Div has stationary transition distributions. For example, p11=1/3p_{11} = 1/3.

In the language of multivariate distributions, when a Markov chain has stationary transition distributions, specified by eq-3-10-3, we can write the conditional pmf of Xn+1X_{n+1} given XnX_n as

g(ji)=pij,g(j \mid i) = p_{ij},

{#eq-3-10-4}

for all nn, ii, jj.

Example 3.10.3: Occupied Telephone Lines

To illustrate the application of these concepts, we shall consider again the example involving the office with five telephone lines. In order for this stochastic process to be a Markov chain, the specified distribution for the number of lines that may be in use at each time must depend only on the number of lines that were in use when the process was observed most recently 2 minutes earlier and must not depend on any other observed values previously obtained. For example, if three lines were in use at time nn, then the distribution for time n+1n + 1 must be the same regardless of whether 0, 1, 2, 3, 4, or 5 lines were in use at time n1n − 1.

In reality, however, the observation at time n1n − 1 might provide some information in regard to the length of time for which each of the three lines in use at time nn had been occupied, and this information might be helpful in determining the distribution for time n+1n + 1. Nevertheless, we shall suppose now that this process is a Markov chain.

If this Markov chain is to have stationary transition distributions, it must be true that the rates at which incoming and outgoing telephone calls are made and the average duration of these telephone calls do not change during the entire period covered by the process. This requirement means that the overall period cannot include busy times when more calls are expected or quiet times when fewer calls are expected. For example, if only one line is in use at a particular observation time, regardless of when this time occurs during the entire period covered by the process, then there must be a specific probability p1jp_{1j} that exactly jj lines will be in use 2 minutes later.

The Transition Matrix

Example 3.10.4: Shopping for Toothpaste

The notation for stationary transition distributions, pijp_{ij}, suggests that they could be arranged in a matrix. The transition probabilities for Div can be arranged into the following matrix:

P=[13232313].\mathbf{P} = \begin{bmatrix} \frac{1}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{1}{3} \end{bmatrix}.

Every finite Markov chain with stationary transition distributions has a matrix like the one constructed in Div.

Consider a finite Markov chain with stationary transition distributions given by pij=Pr(Xn+1=jXn=i)p_{ij} = \Pr(X_{n+1} = j \mid X_n = i) for all nn, ii, jj. The transition matrix of the Markov chain is defined to be the k×kk \times k matrix P\mathbf{P} with elements pijp_{ij}. That is,

P=[p11p1kp21p2kpk1pkk].\mathbf{P} = \begin{bmatrix} p_{11} & \cdots & p_{1k} \\ p_{21} & \cdots & p_{2k} \\ \vdots & \ddots & \vdots \\ p_{k1} & \cdots & p_{kk} \end{bmatrix}.

A transition matrix has several properties that are apparent from its defintion. For example, each element is nonnegative because all elements are probabilities. Since each row of a transition matrix is a conditional pmf for the next state given some value of the current state, we have j=1kpij=1\sum_{j=1}^{k}p_{ij} = 1 for i=1,,ki = 1, \ldots, k. Indeed, row ii of the transition matrix specifies the conditional pmf g(i)g(\cdot \mid i) defined in eq-3-10-4.

Definition 3.10.5: Stochastic Matrix

A square matrix for which all elements are nonnegative and the sum of the elements in each row is 1 is called a stochastic matrix.

It is clear that the transition matrix P\mathbf{P} for every finite Markov chain with stationary transition probabilities must be a stochastic matrix. Conversely, every k×kk \times k stochastic matrix can serve as the transition matrix of a finite Markov chain with kk possible states and stationary transition distributions.

Example 3.10.5: A Transition Matrix for the Number of Occupied Telephone Lines

Suppose that in the example involving the office with five telephone lines, the numbers of lines being used at times 1,2,1, 2, \ldots form a Markov chain with stationary transition distributions. This chain has six possible states 0,1,,50, 1, \ldots, 5, where ii is the state in which exactly ii lines are being used at a given time (i=0,1,,5i = 0, 1, \ldots, 5). Suppose that the transition matrix P\mathbf{P} is as follows:

P=[0.10.40.20.10.10.10.20.30.20.10.10.10.10.20.30.20.10.10.10.10.20.30.20.10.10.10.10.20.30.20.10.10.10.10.40.2]\mathbf{P} = \begin{bmatrix} 0.1 & 0.4 & 0.2 & 0.1 & 0.1 & 0.1 \\ 0.2 & 0.3 & 0.2 & 0.1 & 0.1 & 0.1 \\ 0.1 & 0.2 & 0.3 & 0.2 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.2 & 0.3 & 0.2 & 0.1 \\ 0.1 & 0.1 & 0.1 & 0.2 & 0.3 & 0.2 \\ 0.1 & 0.1 & 0.1 & 0.1 & 0.4 & 0.2 \end{bmatrix}

(a) Assuming that all five lines are in use at a certain observation time, we shall determine the probability that exactly four lines will be in use at the next observation time. (b) Assuming that no lines are in use at a certain time, we shall determine the probability that at least one line will be in use at the next observation time.

(a) This probability is the element in the matrix P\mathbf{P} in the row corresponding to the state 5 and the column corresponding to the state 4. Its value is seen to be 0.4. (b) If no lines are in use at a certain time, then the element in the upper left corner of the matrix P\mathbf{P} gives the probability that no lines will be in use at the next observation time. Its value is seen to be 0.1. Therefore, the probability that at least one line will be in use at the next observation time is 10.1=0.91 − 0.1 = 0.9.

The generation following \{Aa, Aa\}

Figure 3.28:The generation following {Aa,Aa}\{Aa, Aa\}

Example 3.10.6: Plant Breeding Experiment

A botanist is studying a certain variety of plant that is monoecious (has male and female organs in separate flowers on a single plant). She begins with two plants I and II and cross-pollinates them by crossing male I with female II and female I with male II to produce two offspring for the next generation. The original plants are destroyed and the process is repeated as soon as the new generation of two plants is mature. Several replications of the study are run simultaneously. The botanist might be interested in the proportion of plants in any generation that have each of several possible genotypes for a particular gene. (See Example 1.6.4.)

Suppose that the gene has two alleles, AA and aa. The genotype of an individual will be one of the three combinations AAAA, AaAa, or aaaa. When a new individual is born, it gets one of the two alleles (with probability 1/21/2 each) from one of the parents, and it independently gets one of the two alleles from the other parent. The two offspring get their genotypes independently of each other.

For example, if the parents have genotypes AAAA and AaAa, then an offspring will get AA for sure from the first parent and will get either AA or aa from the second parent with probability 1/21/2 each. Let the states of this population be the set of genotypes of the two members of the current population. We will not distinguish the set {AA,Aa}\{AA, Aa\} from {Aa,AA}\{Aa, AA\}. There are then six states: {AA,AA}\{AA, AA\}, {AA,Aa}\{AA, Aa\}, {AA,aa}\{AA, aa\}, {Aa,Aa}\{Aa, Aa\}, {Aa,aa}\{Aa, aa\}, and {aa,aa}\{aa, aa\}. For each state, we can calculate the probability that the next generation will be in each of the six states. For example, if the state is either {AA,AA}\{AA, AA\} or {aa,aa}\{aa, aa\}, the next generation will be in the same state with probability 1. If the state is {AA,aa}\{AA, aa\}, the next generation will be in state {Aa,Aa}\{Aa, Aa\} with probability 1. The other three states have more complicated transitions.

If the current state is {Aa,Aa}\{Aa, Aa\}, then all six states are possible for the next generation. In order to compute the transition distribution, it helps to first compute the probability that a given offspring will have each of the three genotypes. Figure 3.28 illustrates the possible offspring in this state. Each arrow going down in Figure 3.28 is a possible inheritance of an allele, and each combination of arrows terminating in a genotype has probability 1/41/4.

It follows that the probability of AAAA and aaaa are both 1/41/4, while the probability of AaAa is 1/21/2, because two different combinations of arrows lead to this offspring. In order for the next state to be {AA,AA}\{AA, AA\}, both offspring must be AAAA independently, so the probability of this transition is 1/161/16. The same argument implies that the probability of a transition to {aa,aa}\{aa, aa\} is 1/161/16. A transition to {AA,Aa}\{AA, Aa\} requires one offspring to be AAAA (probability 1/41/4) and the other to be AaAa (probabilty 1/21/2). But the two different genotypes could occur in either order, so the whole probability of such a transition is 2×(1/4)×(1/2)=1/42 \times (1/4) \times (1/2) = 1/4. A similar argument shows that a transition to {Aa,aa}\{Aa, aa\} also has probability 1/41/4. A transition to {AA,aa}\{AA, aa\} requires one offspring to be AAAA (probability 1/41/4) and the other to be aaaa (probability 1/4). Once again, these can occur in two orders, so the whole probability is 2×1/4×1/4=1/82 \times 1/4 \times 1/4 = 1/8. By subtraction, the probability of a transition to {Aa,Aa}\{Aa, Aa\} must be 11/161/161/41/41/8=1/41 − 1/16 − 1/16 − 1/4 − 1/4 − 1/8 = 1/4. Here is the entire transition matrix, which can be verified in a manner similar to what has just been done:

[1.00000.00000.00000.00000.00000.00000.25000.50000.00000.25000.00000.00000.00000.00000.00001.00000.00000.00000.06250.25000.12500.25000.25000.06250.00000.00000.00000.25000.50000.25000.00000.00000.00000.00000.00001.0000]\begin{bmatrix} 1.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\ 0.2500 & 0.5000 & 0.0000 & 0.2500 & 0.0000 & 0.0000 \\ 0.0000 & 0.0000 & 0.0000 & 1.0000 & 0.0000 & 0.0000 \\ 0.0625 & 0.2500 & 0.1250 & 0.2500 & 0.2500 & 0.0625 \\ 0.0000 & 0.0000 & 0.0000 & 0.2500 & 0.5000 & 0.2500 \\ 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 & 1.0000 \end{bmatrix}

The Transition Matrix for Several Steps

Example 3.10.7: Single Server Queue

A manager usually checks the server at her store every 5 minutes to see whether the server is busy or not. She models the state of the server (1 = busy or 2 = not busy) as a Markov chain with two possible states and stationary transition distributions given by the following matrix:

P=[0.90.10.60.4].\mathbf{P} = \begin{bmatrix} 0.9 & 0.1 \\ 0.6 & 0.4 \end{bmatrix}.

The manager realizes that, later in the day, she will have to be away for 10 minutes and will miss one server check. She wants to compute the conditional distribution of the state two time periods in the future given each of the possible states. She reasons as follows: If Xn=1X_n = 1 for example, then the state will have to be either 1 or 2 at time n+1n + 1 even though she does not care now about the state at time n+1n + 1. But, if she computes the joint conditional distribution of Xn+1X_{n+1} and Xn+2X_{n+2} given Xn=1X_n = 1, she can sum over the possible values of Xn+1X_{n+1} to get the conditional distribution of Xn+2X_{n+2} given Xn=1X_n = 1. In symbols,

Pr(Xn+2=1Xn=1)=Pr(Xn+1=1,Xn+2=1Xn=1)+Pr(Xn+1=2,Xn+2=1Xn=1).\begin{align*} \Pr(X_{n+2} = 1 \mid X_n = 1) = &\Pr(X_{n+1} = 1, X_{n+2} = 1 \mid X_n = 1) \\ &+ \Pr(X_{n+1} = 2, X_{n+2} = 1 \mid X_n = 1). \end{align*}

By the second part of Div,

Pr(Xn+1=1,Xn+2=1Xn=1)=Pr(Xn+1=1Xn=1)Pr(Xn+2=1Xn+1=1)=0.9×0.9=0.81.\begin{align*} Pr(X_{n+1} = 1, X_{n+2} = 1 \mid X_n = 1) &= \Pr(X_{n+1} = 1 \mid X_n = 1)\Pr(X_{n+2} = 1 \mid X_{n+1} = 1) \\ &= 0.9 \times 0.9 = 0.81. \end{align*}

Similarly,

Pr(Xn+1=2,Xn+2=1Xn=1)=Pr(Xn+1=2Xn=1)Pr(Xn+2=1Xn+1=2)=0.1×0.6=0.06.\begin{align*} \Pr(X_{n+1} = 2, X_{n+2} = 1 \mid X_n = 1) &= \Pr(X_{n+1} = 2 \mid X_n = 1)\Pr(X_{n+2} = 1 \mid X_{n+1} = 2) \\ &= 0.1 \times 0.6 = 0.06. \end{align*}

It follows that Pr(Xn+2=1Xn=1)=0.81+0.06=0.87\Pr(X_{n+2} = 1 \mid X_n = 1) = 0.81 + 0.06 = 0.87, and hence Pr(Xn+2=2Xn=1)=10.87=0.13\Pr(X_{n+2} = 2 \mid X_n = 1) = 1 − 0.87 = 0.13. By similar reasoning, if Xn=2X_n = 2,

Pr(Xn+2=1Xn=2)=0.6×0.9+0.4×0.6=0.78,\Pr(X_{n+2} = 1 \mid X_n = 2) = 0.6 \times 0.9 + 0.4 \times 0.6 = 0.78,

and Pr(Xn+2=2Xn=2)=10.78=0.22\Pr(X_{n+2} = 2 \mid X_n = 2) = 1 - 0.78 = 0.22.

Generalizing the calculations in Div to three or more transitions might seem tedious. However, if one examines the calculations carefully, one sees a pattern that will allow a compact calculation of transition distributions for several steps.

Consider a general Markov chain with kk possible states 1,,k1, \ldots, k and the transition matrix P\mathbf{P} given by eq-3-10-5. Assuming that the chain is in state ii at a given time nn, we shall now determine the probability that the chain will be in state jj at time n+2n + 2. In other words, we shall determine the conditional probability of Xn+2=jX_{n+2} = j given Xn=iX_n = i. The notation for this probability is pij(2)p^{(2)}_{ij}.

We argue as the manager did in Div. Let rr denote the value of Xn+1X_{n+1} that is not of primary interest but is helpful to the calculation. Then

pij(2)=Pr(Xn+2=jXn=i)=r=1kPr(Xn+1=r,Xn+2=jXn=i)=r=1kPr(Xn+1=rXn=i)Pr(Xn+2=jXn+1=r,Xn=i)=r=1kPr(Xn+1=rXn=i)Pr(Xn+2=jXn+1=r)=r=1kpirprj,\begin{align*} p^{(2)}_{ij} &= \Pr(X_{n+2} = j \mid X_n = i) \\ &= \sum_{r=1}^{k}\Pr(X_{n+1} = r, X_{n+2} = j \mid X_n = i) \\ &= \sum_{r=1}^{k}\Pr(X_{n+1} = r \mid X_n = i)Pr(X_{n+2} = j \mid X_{n+1} = r, X_n = i) \\ &= \sum_{r=1}^{k}\Pr(X_{n+1} = r \mid X_n = i)\Pr(X_{n+2} = j \mid X_{n+1} = r) \\ &= \sum_{r=1}^{k}p_{ir}p_{rj}, \end{align*}

where the third equality follows from Theorem 2.1.3 and the fourth equality follows from the definition of a Markov chain.

The value of pij(2)p^{(2)}_{ij} can be determined in the following manner: If the transition matrix P\mathbf{P} is squared, that is, if the matrix P2=PP\mathbf{P}^2 = \mathbf{P}\mathbf{P} is constructed, then the element in the iith row and the jjth column of the matrix P2\mathbf{P}^2 will be r=1kpirprj\sum_{r=1}^{k}p_{ir}p_{rj}. Therefore, pij(2)p^{(2)}_{ij} will be the element in the iith row and the jjth column of P2\mathbf{P}^2.

By a similar argument, the probability that the chain will move from the state ii to the state jj in three steps, or pij(3)=Pr(Xn+3=jXn=i)p^{(3)}_{ij} = \Pr(X_{n+3} = j \mid X_n = i), can be found by constructing the matrix P3=P2P\mathbf{P}^3 = \mathbf{P}^2\mathbf{P}. Then the probability pij(3)p^{(3)}_{ij} will be the element in the iith row and the jjth column of the matrix P3\mathbf{P}^3.

In general, we have the following result.

Theorem 3.10.2: Multiple Step Transitions

Let P\mathbf{P} be the transition matrix of a finite Markov chain with stationary transition distributions. For each m=2,3,m = 2, 3, \ldots, the mmth power Pm\mathbf{P}^m of the matrix P\mathbf{P} has in row ii and column jj the probability pij(m)p^{(m)}_{ij} that the chain will move from state ii to state jj in mm steps.

Definition 3.10.6: Multiple Step Transition Matrix

Under the conditions of Div, the matrix Pm\mathbf{P}^m is called the mm-step transition matrix of the Markov chain.

In summary, the iith row of the mm-step transition matrix gives the conditional distribution of Xn+mX_{n+m} given Xn=iX_n = i for all i=1,,ki = 1, \ldots, k and all n,m=1,2,n, m = 1, 2, \ldots.

Example 3.10.8: The Two-Step and Three-Step Transition Matrices for the Number of Occupied Telephone Lines

Consider again the transition matrix P\mathbf{P} given by eq-3-10-6 for the Markov chain based on five telephone lines. We shall assume first that ii lines are in use at a certain time, and we shall determine the probability that exactly jj lines will be in use two time periods later.

If we multiply the matrix P\mathbf{P} by itself, we obtain the following two-step transition matrix:

P2=[0.140.230.200.150.160.120.130.240.200.150.160.120.120.200.210.180.170.120.110.170.190.200.200.130.110.160.160.180.240.150.110.160.150.170.250.16]\mathbf{P}^2 = \begin{bmatrix} 0.14 & 0.23 & 0.20 & 0.15 & 0.16 & 0.12 \\ 0.13 & 0.24 & 0.20 & 0.15 & 0.16 & 0.12 \\ 0.12 & 0.20 & 0.21 & 0.18 & 0.17 & 0.12 \\ 0.11 & 0.17 & 0.19 & 0.20 & 0.20 & 0.13 \\ 0.11 & 0.16 & 0.16 & 0.18 & 0.24 & 0.15 \\ 0.11 & 0.16 & 0.15 & 0.17 & 0.25 & 0.16 \end{bmatrix}

{#eq-3-10-7}

From this matrix we can find any two-step transition probability for the chain, such as the following:

i. If two lines are in use at a certain time, then the probability that four lines will be in use two time periods later is 0.17. ii. If three lines are in use at a certain time, then the probability that three lines will again be in use two time periods later is 0.20.

We shall now assume that ii lines are in use at a certain time, and we shall determine the probability that exactly jj lines will be in use three time periods later.

If we construct the matrix P3=P2P\mathbf{P}^3 = \mathbf{P}^2\mathbf{P}, we obtain the following three-step transition matrix:

P3=[0.1230.2080.1920.1660.1830.1280.1240.2070.1920.1660.1830.1280.1200.1970.1920.1740.1880.1290.1170.1860.1860.1790.1990.1330.1160.1810.1770.1760.2110.1390.1160.1800.1740.1740.2150.141]\mathbf{P}^3 = \begin{bmatrix} 0.123 & 0.208 & 0.192 & 0.166 & 0.183 & 0.128 \\ 0.124 & 0.207 & 0.192 & 0.166 & 0.183 & 0.128 \\ 0.120 & 0.197 & 0.192 & 0.174 & 0.188 & 0.129 \\ 0.117 & 0.186 & 0.186 & 0.179 & 0.199 & 0.133 \\ 0.116 & 0.181 & 0.177 & 0.176 & 0.211 & 0.139 \\ 0.116 & 0.180 & 0.174 & 0.174 & 0.215 & 0.141 \end{bmatrix}

{#eq-3-10-8}

From this matrix we can find any three-step transition probability for the chain, such as the following:

i. If all five lines are in use at a certain time, then the probability that no lines will be in use three time periods later is 0.116. ii. If one line is in use at a certain time, then the probability that exactly one line will again be in use three time periods later is 0.207.

Example 3.10.9: Plant Breeding Experiment

In Div, the transition matrix has many zeros, since many of the transitions will not occur. However, if we are willing to wait two steps, we will find that the only transitions that cannot occur in two steps are those from the first state to anything else and those from the last state to anything else.

Here is the two-step transition matrix:

[1.00000.00000.00000.00000.00000.00000.39060.31250.03130.18750.06250.01560.06250.25000.12500.25000.25000.06250.14060.18750.03130.31250.18750.14060.01560.06250.03130.18750.31250.39060.00000.00000.00000.00000.00001.0000]\begin{bmatrix} 1.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\ 0.3906 & 0.3125 & 0.0313 & 0.1875 & 0.0625 & 0.0156 \\ 0.0625 & 0.2500 & 0.1250 & 0.2500 & 0.2500 & 0.0625 \\ 0.1406 & 0.1875 & 0.0313 & 0.3125 & 0.1875 & 0.1406 \\ 0.0156 & 0.0625 & 0.0313 & 0.1875 & 0.3125 & 0.3906 \\ 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 & 1.0000 \end{bmatrix}

Indeed, if we look at the three-step or the four-step or the general mm-step transition matrix, the first and last rows will always be the same.

The first and last states in Div have the property that, once the chain gets into one of those states, it can’t get out. Such states occur in many Markov chains and have a special name.

Definition 3.10.7: Absorbing State

In a Markov chain, if pii=1p_{ii} = 1 for some state ii, then that state is called an absorbing state.

In Div, there is positive probability of getting into each absorbing state in two steps no matter where the chain starts. Hence, the probability is 1 that the chain will eventually be absorbed into one of the absorbing states if it is allowed to run long enough.

The Initial Distribution

Example 3.10.10: Single Server Queue

The manager in Div enters the store thinking that the probability is 0.3 that the server will be busy the first time that she checks. Hence, the probability is 0.7 that the server will be not busy. These values specify the marginal distribution of the state at time 1, X1X_1. We can represent this distribution by the vector v=(0.3,0.7)\mathbf{v} = (0.3, 0.7) that gives the probabilities of the two states at time 1 in the same order that they appear in the transition matrix.

The vector giving the marginal distribution of X1X_1 in Div has a special name.

Definition 3.10.8: Probability Vector/Initial Distribution

A vector consisting of nonnegative numbers that add to 1 is called a probability vector. A probability vector whose coordinates specify the probabilities that a Markov chain will be in each of its states at time 1 is called the initial distribution of the chain or the intial probability vector.

For Div, the initial distribution was given in Div as v=(0.5,0.5)\mathbf{v} = (0.5, 0.5).

The initial distribution and the transition matrix together determine the entire joint distribution of the Markov chain. Indeed, Div shows how to construct the joint distribution of the chain from the initial probability vector and the transition matrix. Letting v=(v1,,vk)\mathbf{v} = (v_1, \ldots, v_k) denote the initial distribution, eq-3-10-1 can be rewritten as

Pr(X1=x1,X2=x2,,Xn=xn)=vx1px1x2pxn1xn.\Pr(X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n) = v_{x_1}p_{x_1x_2}\cdots p_{x_{n-1}x_n}.

{#eq-3-10-9}

The marginal distributions of states at times later than 1 can be found from the joint distribution.

Theorem 3.10.3: Marginal Distributions at Times Other Than 1

Consider a finite Markov chain with stationary transition distributions having initial distribution v\mathbf{v} and transition matrix P\mathbf{P}. The marginal distribution of XnX_n, the state at time nn, is given by the probability vector vPn1\mathbf{v}\mathbf{P}^{n−1}.

The marginal distribution of XnX_n can be found from eq-3-10-9 by summing over the possible values of x1,,xn1x_1, \ldots, x_{n-1}. That is,

Pr(Xn=xn)=xn1=1kx2=1kx1=1kvx1px1x2px2x3pxn1xn.\Pr(X_n = x_n) = \sum_{x_{n-1}=1}^{k}\cdots \sum_{x_2=1}^{k}\sum_{x_1=1}^{k}v_{x_1}p_{x_1x_2}p_{x_2x_3}\cdots p_{x_{n-1}x_n}.

{#eq-3-10-10}

The innermost sum in eq-3-10-10 for x1=1,,kx_1= 1, \ldots, k involves only the first two factors vx1px1x2v_{x_1}p_{x_1x_2} and produces the x2x_2 coordinate of vP\mathbf{v}\mathbf{P}. Similarly, the next innermost sum over x2=1,,kx_2 = 1, \ldots, k involves only the x2x_2 coordinate of vP\mathbf{v}\mathbf{P} and px2x3p_{x_2x_3} and produces the x3x_3 coordinate of vPP\mathbf{v}\mathbf{P}\mathbf{P} = vP2\mathbf{v}\mathbf{P}^2. Proceeding in this way through all n1n − 1 summations produces the xnx_n coordinate of vPn1\mathbf{v}\mathbf{P}^{n−1}.


::: {#exm-3-10-11}

# Example 3.10.11: Probabilities for the Number of Occupied Telephone Lines

Consider again the office with five telephone lines and the Markov chain for which the transition matrix $\mathbf{P}$ is given by @eq-3-10-6. Suppose that at the beginning of the observation process at time $n = 1$, the probability that no lines will be in use is 0.5, the probability that one line will be in use is 0.3, and the probability that two lines will be in use is 0.2. Then the initial probability vector is $\mathbf{v} = (0.5, 0.3, 0.2, 0, 0, 0)$. We shall first determine the distribution of the number of lines in use at time 2, one period later.

By an elementary computation it will be found that

$$
\mathbf{v}\mathbf{P} = (0.13, 0.33, 0.22, 0.12, 0.10, 0.10).
$$

Since the first component of this probability vector is 0.13, the probability that no lines will be in use at time 2 is 0.13; since the second component is 0.33, the probability that exactly one line will be in use at time 2 is 0.33; and so on.

Next, we shall determine the distribution of the number of lines that will be in use at time 3.

By use of @eq-3-10-7, it can be found that

$$
\mathbf{v}\mathbf{P}^2 = (0.133, 0.227, 0.202, 0.156, 0.162, 0.120).
$$

Since the first component of this probability vector is 0.133, the probability that no lines will be in use at time 3 is 0.133; since the second component is 0.227, the probability that exactly one line will be in use at time 3 is 0.227; and so on.

Stationary Distributions

Example 3.10.12: A Special Initial Distribution for Telephone Lines

Suppose that the initial distribution for the number of occupied telephone lines is

v=(0.119,0.193,0.186,0.173,0.196,0.133).\mathbf{v} = (0.119, 0.193, 0.186, 0.173, 0.196, 0.133).

It can be shown, by matrix multiplication, that vP=v\mathbf{v}\mathbf{P} = \mathbf{v}. This means that if v\mathbf{v} is the initial distribution, then it is also the distribution after one transition. Hence, it will also be the distribution after two or more transitions as well.

Definition 3.10.9: Stationary Distribution

Let P\mathbf{P} be the transition matrix for a Markov chain. A probability vector v\mathbf{v} that satisfies vP=v\mathbf{v}\mathbf{P} = \mathbf{v} is called a stationary distribution for the Markov chain.

The initial distribution in Div is a stationary distribution for the telephone lines Markov chain. If the chain starts in this distribution, the distribution stays the same at all times. Every finite Markov chain with stationary transition distributions has at least one stationary distribution. Some chains have a unique stationary distribution.

Note: A Stationary Distribution Does Not Mean That the Chain is Not Moving. It is important to note that vPn\mathbf{v}\mathbf{P}^n gives the probabilities that the chain is in each of its states after nn transitions, calculated before the initial state of the chain or any transitions are observed. These are different from the probabilities of being in the various states after observing the initial state or after observing any of the intervening transitions. In addition, a stationary distribution does not imply that the Markov chain is staying put. If a Markov chain starts in a stationary distribution, then for each state ii, the probability that the chain is in state ii after nn transitions is the same as the probability that it is state i at the start. But the Markov chain can still move around from one state to the next at each transition. The one case in which a Markov chain does stay put is after it moves into an absorbing state. A distribution that is concentrated solely on absorbing states will necessarily be stationary because the Markov chain will never move if it starts in such a distribution. In such cases, all of the uncertainty surrounds the initial state, which will also be the state after every transition.

Example 3.10.13: Stationary Distributions for the Plant Breeding Experiment

Consider again the experiment described in Div. The first and sixth states, {AA,AA}\{AA, AA\} and {aa,aa}\{aa, aa\}, respectively, are absorbing states. It is easy to see that every initial distribution of the form v=(p,0,0,0,0,1p)\mathbf{v} = (p, 0, 0, 0, 0, 1 − p) for 0p10 \leq p \leq 1 has the property that vP=v\mathbf{v}\mathbf{P} = \mathbf{v}. Suppose that the chain is in state 1 with probability pp and in state 6 with probability 1p1 − p at time 1. Because these two states are absorbing states, the chain will never move and the event X1=1X_1 = 1 is the same as the event that Xn=1X_n = 1 for all nn. Similarly, X1=6X_1 = 6 is the same as Xn=6X_n = 6. So, thinking ahead to where the chain is likely to be after nn transitions, we would also say that it will be in state 1 with probability pp and in state 6 with probability 1p1 − p.

Method for Finding Stationary Distributions: We can rewrite the equation vP=v\mathbf{v}\mathbf{P} = \mathbf{v} that defines stationary distributions as v[PI]=0\mathbf{v}[\mathbf{P} − \mathbf{I}] = \mathbf{0}, where I\mathbf{I} is a k×kk \times k identity matrix and 0\mathbf{0} is a kk-dimensional vector of all zeros. Unfortunately, this system of equations has lots of solutions even if there is a unique stationary distribution. The reason is that whenever v\mathbf{v} solves the system, so does cvc\mathbf{v} for all real cc (including c=0c = 0). Even though the system has kk equations for kk variables, there is at least one redundant equation. However, there is also one missing equation. We need to require that the solution vector v\mathbf{v} has coordinates that sum to 1. We can fix both of these problems by replacing one of the equations in the original system by the equation that says that the coordinates of v\mathbf{v} sum to 1.

To be specific, define the matrix G\mathbf{G} to be PI\mathbf{P} − \mathbf{I} with its last column replaced by a column of all ones. Then, solve the equation

vG=(0,,0,1).\mathbf{v}\mathbf{G} = (0, \ldots, 0, 1).

{#eq-3-10-11}

If there is a unique stationary distribution, we will find it by solving eq-3-10-11. In this case, the matrix G\mathbf{G} will have an inverse G1\mathbf{G}^{−1} that satisfies

GG1=G1G=I.\mathbf{G}\mathbf{G}^{-1} = \mathbf{G}^{-1}\mathbf{G} = \mathbf{I}.

The solution of eq-3-10-11 will then be

v=(0,,0,1)G1,\mathbf{v} = (0, \ldots, 0, 1)\mathbf{G}^{-1},

which is easily seen to be the bottom row of the matrix G1\mathbf{G}^{−1}. This was the method used to find the stationary distribution in Div. If the Markov chain has multiple stationary distributions, then the matrix G\mathbf{G} will be singular, and this method will not find any of the stationary distributions. That is what would happen in Div if one were to apply the method.

Example 3.10.14: Stationary Distribution for Toothpaste Shopping

Consider the transition matrix P\mathbf{P} given in Div. We can construct the matrix G\mathbf{G} as follows:

PI=[23232323];   hence G=[231231].\mathbf{P} - \mathbf{I} = \begin{bmatrix} -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & -\frac{2}{3} \end{bmatrix}; \; \text{ hence } \mathbf{G} = \begin{bmatrix} -\frac{2}{3} & 1 \\ \frac{2}{3} & 1 \end{bmatrix}.

The inverse of G\mathbf{G} is

G1=[34341212].\mathbf{G}^{-1} = \begin{bmatrix} -\frac{3}{4} & \frac{3}{4} \\ \frac{1}{2} & \frac{1}{2} \end{bmatrix}.

We now see that the stationary distribution is the bottom row of G1\mathbf{G}^{-1}, v=(1/2,1/2)\mathbf{v} = (1/2, 1/2).

There is a special case in which it is known that a unique stationary distribution exists and it has special properties.

Theorem 3.10.4

If there exists mm such that every element of Pm\mathbf{P}^m is strictly positive, then

  • The Markov chain has a unique stationary distribution v\mathbf{v},

  • limnPn\lim_{n \rightarrow \infty}\mathbf{P}^n is a matrix with all rows equal to v\mathbf{v}, and

  • No matter with what distribution the Markov chain starts, its distribution after nn steps converges to v\mathbf{v} as nn \rightarrow \infty.

We shall not prove Div, although some evidence for the second claim can be seen in eq-3-10-8, where the six rows of P3\mathbf{P}^3 are much more alike than the rows of P\mathbf{P} and they are very similar to the stationary distribution given in Div. The third claim in Div actually follows easily from the second claim. In sec-12-5, we shall introduce a method that makes use of the third claim in Div in order to approximate distributions of random variables when those distributions are difficult to calculate exactly.

The transition matrices in Examples Div, thm-3-10-5, and thm-3-10-7 satisfy the conditions of Div. The following example has a unique stationary distribution but does not satisfy the conditions of Div.

Example 3.10.15: Alternating Chain

Let the transition matrix for a two-state Markov chain be

P=[0110].\mathbf{P} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.

The matrix G\mathbf{G} is easy to construct and invert, and we find that the unique stationary distribution is v=(0.5,0.5)\mathbf{v} = (0.5, 0.5). However, as mm increases, Pm\mathbf{P}^m alternates between P\mathbf{P} and the 2×22 \times 2 identity matrix. It does not converge and never does it have all elements strictly positive. If the initial distribution is (v1,v2)(v_1, v_2), the distribution after nn steps alternates between (v1,v2)(v_1, v_2) and (v2,v1)(v_2, v_1).

Another example that fails to satisfy the conditions of Div is the gambler’s ruin problem from 2.4 The Gambler’s Ruin Problem.

Example 3.10.16: Gambler’s Ruin

In 2.4 The Gambler’s Ruin Problem, we described the gambler’s ruin problem, in which a gambler wins one dollar with probability pp and loses one dollar with probability 1p1 − p on each play of a game. The sequence of amounts held by the gambler through the course of those plays forms a Markov chain with two absorbing states, namely, 0 and kk. There are k1k − 1 other states, namely, 1,,k11, \ldots, k-1. (This notation violates our use of kk to stand for the number of states, which is k+1k + 1 in this example. We felt this was less confusing than switching from the original notation of 2.4 The Gambler’s Ruin Problem.)

The transition matrix has first and last row being (1,0,,0)(1, 0, \ldots, 0) and (0,,1)(0, \ldots, 1), respectively. The iith row (for i=1,,k1i = 1, \ldots, k-1) has 0 everywhere except in coordinate i1i − 1 where it has 1p1 − p and in coordinate i+1i + 1 where it has pp. Unlike Div, this time the sequence of matrices Pm\mathbf{P}^m converges but there is no unique stationary distribution. The limit of Pm\mathbf{P}^m has as its last column the numbers a0,,aka_0, \ldots, a_k, where aia_i is the probability that the fortune of a gambler who starts with ii dollars reaches kk dollars before it reaches 0 dollars.

The first column of the limit has the numbers 1a0,,1ak1 − a_0, \ldots, 1 - a_k and the rest of the limit matrix is all zeros. The stationary distributions have the same form as those in Div, namely, all probability is in the absorbing states.

Summary

A Markov chain is a stochastic process, a sequence of random variables giving the states of the process, in which the conditional distribution of the state at the next time given all of the past states depends on the past states only through the most recent state. For Markov chains with finitely many states and stationary transition distributions, the transitions over time can be described by a matrix giving the probabilities of transition from the state indexing the row to the state indexing the column (the transition matrix P\mathbf{P}). The initial probability vector v\mathbf{v} gives the distribution of the state at time 1. The transition matrix and initial probability vector together allow calculation of all probabilities associated with the Markov chain. In particular, Pn\mathbf{P}^n gives the probabilities of transitions over nn time periods, and vPn\mathbf{v}\mathbf{P}^n gives the distribution of the state at time n+1n + 1. A stationary distribution is a probability vector v\mathbf{v} such that vP=v\mathbf{v}\mathbf{P} = \mathbf{v}. Every finite Markov chain with stationary transition distributions has at least one stationary distribution. For many Markov chains, there is a unique stationary distribution and the distribution of the chain after nn transitions converges to the stationary distribution as nn goes to \infty.

Exercises

Exercise 3.10.1

Consider the Markov chain in Div with initial probability vector v=(1/2,1/2)\mathbf{v} = (1/2, 1/2).

a. Find the probability vector specifying the probabilities of the states at time n=2n = 2. b. Find the two-step transition matrix.

Exercise 3.10.4

Consider again the conditions of Exercises Exercise 3.10.2 and Exercise 3.10.3.

a. If it is sunny on a certain Wednesday, what is the probability that it will be sunny on both the following Saturday and Sunday? b. If it is cloudy on a certain Wednesday, what is the probability that it will be sunny on both the following Saturday and Sunday?

Exercise 3.10.5

Consider again the Markov chain described in Exercise 3.10.2. Suppose that the probability that it will be sunny on a certain Wednesday is 0.2 and the probability that it will be cloudy is 0.8.

a. Determine the probability that it will be cloudy on the next day, Thursday. b. Determine the probability that it will be cloudy on Friday. c. Determine the probability that it will be cloudy on Saturday.

Exercise 3.10.6

Suppose that a student will be either on time or late for a particular class and that the events that he is on time or late for the class on successive days form a Markov chain with stationary transition probabilities. Suppose also that if he is late on a given day, then the probability that he will be on time the next day is 0.8. Furthermore, if he is on time on a given day, then the probability that he will be late the next day is 0.5.

a. If the student is late on a certain day, what is the probability that he will be on time on each of the next three days? b. If the student is on time on a given day, what is the probability that he will be late on each of the next three days?

Exercise 3.10.7

Consider again the Markov chain described in Div.

a. If the student is late on the first day of class, what is the probability that he will be on time on the fourth day of class? b. If the student is on time on the first day of class, what is the probability that he will be on time on the fourth day of class?

Exercise 3.10.8

Consider again the conditions of Exercises Div and Div. Suppose that the probability that the student will be late on the first day of class is 0.7 and that the probability that he will be on time is 0.3.

a. Determine the probability that he will be late on the second day of class. b. Determine the probability that he will be on time on the fourth day of class.

Exercise 3.10.9

Suppose that a Markov chain has four states 1, 2, 3, 4 and stationary transition probabilities as specified by the following transition matrix:

[1/41/401/201001/201/201/41/41/41/4]\begin{bmatrix} 1/4 & 1/4 & 0 & 1/2 \\ 0 & 1 & 0 & 0 \\ 1/2 & 0 & 1/2 & 0 \\ 1/4 & 1/4 & 1/4 & 1/4 \end{bmatrix}

a. If the chain is in state 3 at a given time nn, what is the probability that it will be in state 2 at time n+2n + 2? b. If the chain is in state 1 at a given time nn, what is the probability that it will be in state 3 at time n+3n + 3?

Exercise 3.10.10

Let X1X_1 denote the initial state at time 1 of the Markov chain for which the transition matrix is as specified in Div, and suppose that the initial probabilities are as follows:

Pr(X1=1)=1/8,  Pr(X1=2)=1/4,Pr(X1=3)=3/8,  Pr(X1=4)=1/4.\begin{align*} \Pr(X_1 = 1) &= 1/8, \; \Pr(X_1 = 2) = 1/4, \\ \Pr(X_1 = 3) &= 3/8, \; \Pr(X_1 = 4) = 1/4. \end{align*}

Determine the probabilities that the chain will be in states 1, 2, 3, and 4 at time nn for each of the following values of nn:

(a) n=2n = 2 (b) n=3n = 3 (c) n=4n = 4

Exercise 3.10.11

Each time that a shopper purchases a tube of toothpaste, she chooses either brand AA or brand BB. Suppose that the probability is 1/3 that she will choose the same brand chosen on her previous purchase, and the probability is 2/3 that she will switch brands.

a. If her first purchase is brand AA, what is the probability that her fifth purchase will be brand BB? b. If her first purchase is brand BB, what is the probability that her fifth purchase will be brand BB?

Exercise 3.10.12

Suppose that three boys AA, BB, and CC are throwing a ball from one to another. Whenever AA has the ball, he throws it to BB with a probability of 0.2 and to CC with a probability of 0.8. Whenever BB has the ball, he throws it to AA with a probability of 0.6 and to CC with a probability of 0.4. Whenever CC has the ball, he is equally likely to throw it to either AA or BB.

a. Consider this process to be a Markov chain and construct the transition matrix. b. If each of the three boys is equally likely to have the ball at a certain time nn, which boy is most likely to have the ball at time n+2n + 2?

Exercise 3.10.13

Suppose that a coin is tossed repeatedly in such a way that heads and tails are equally likely to appear on any given toss and that all tosses are independent, with the following exception: Whenever either three heads or three tails have been obtained on three successive tosses, then the outcome of the next toss is always of the opposite type.

At time nn (n3n \geq 3), let the state of this process be specified by the outcomes on tosses n2n − 2, n1n − 1, and nn. Show that this process is a Markov chain with stationary transition probabilities and construct the transition matrix.

Exercise 3.10.14

There are two boxes AA and BB, each containing red and green balls. Suppose that box AA contains one red ball and two green balls and box BB contains eight red balls and two green balls. Consider the following process:

One ball is selected at random from box AA, and one ball is selected at random from box BB. The ball selected from box AA is then placed in box BB and the ball selected from box BB is placed in box AA These operations are then repeated indefinitely.

Show that the numbers of red balls in box AA form a Markov chain with stationary transition probabilities, and construct the transition matrix of the Markov chain.

Exercise 3.10.15

Verify the rows of the transition matrix in Div that correspond to current states {AA,Aa}\{AA, Aa\} and {Aa,aa}\{Aa, aa\}.

Exercise 3.10.16

Let the initial probability vector in Div be v=(1/16,1/4,1/8,1/4,1/4,1/16)\mathbf{v} = (1/16, 1/4, 1/8, 1/4, 1/4, 1/16). Find the probabilities of the six states after one generation.

Exercise 3.10.17

Return to Div. Assume that the state at time n1n − 1 is {Aa,aa}\{Aa, aa\}.

a. Suppose that we learn that Xn+1X_{n+1} is {AA,aa}\{AA, aa\}. Find the conditional distribution of XnX_n. (That is, find all the probabilities for the possible states at time nn given that the state at time n+1n + 1 is {AA,aa}\{AA, aa\}.) b. Suppose that we learn that Xn+1X_{n+1} is {aa,aa}\{aa, aa\}. Find the conditional distribution of XnX_n.

Exercise 3.10.18

Return to Div. Prove that the stationary distributions described there are the only stationary distributions for that Markov chain.

Exercise 3.10.19

Find the unique stationary distribution for the Markov chain in Exercise 3.10.2.

Exercise 3.10.20

The unique stationary distribution in Div is v=(0,1,0,0)\mathbf{v} = (0, 1, 0, 0). This is an instance of the following general result:

Suppose that a Markov chain has exactly one absorbing state. Suppose further that, for each non-absorbing state kk, there is nn such that the probability is positive of moving from state kk to the absorbing state in nn steps. Then the unique stationary distribution has probability 1 in the absorbing state. Prove this result.