Markov Chains

When successive observations are not independent of each other, we need to find a way of describing the dependence. We’ll start with the simplest of cases, when there are two possible states, if we were studying the weather, this would be wet or dry for instance. We can model the weather with a two state Markov chain, meaning that knowing the weather today will tell us what the probability of the different states tomorrow are, and knowing yesterday wouldn’t add to that knowledge. We call the conditional probability \(P(\mbox{Wet tomorrow} \mid \mbox{Wet today})\) the transition probability of the Markov chain and we put it in the \((W,W)\) entry of the transition matrix. The first entry in the first position is the probability that the tomorrow is Dry given that today is Dry, this is \(3/5\).

\[\begin{split}\begin{array}{ll} Dry\\ Wet\\ \end{array} \left[ \begin{array}{ll} \frac{3}{5} & \frac{2}{5} \\ \frac{1}{5} & \frac{4}{5} \end{array} \right]\end{split}\]

Product of two markov chain matrices:

\[\begin{split}{\bf P}{\bf P}= \left[\begin{array}{ll} \frac{3}{5} & \frac{2}{5} \\ \frac{1}{5} & \frac{4}{5} \end{array} \right]\left[\begin{array}{ll} \frac{3}{5} & \frac{2}{5} \\ \frac{1}{5} & \frac{4}{5} \end{array} \right] =\left[\begin{array}{ll} 0.44 & 0.56 \\ 0.28 & 0.72 \end{array} \right]\end{split}\]

Notice that the probability \(P(X_2=W\mid X_0=W)\) is exactly the formula for the \(({\it W} ,{\it W})\) entry in \({\bf P}{\bf P}\)

In more general cases we increase the number of possible states, but as it stays a finite number, we call it a finite state space Markov chain, we will see that in molecular biology it is common to have 4 states, the four nucleotides, or 20 states, the amino acids.

A Markov chain is a special case of what mathematicians call a random process in discrete time, we consider day 1, day 2, day 3, …as the times and are looking at a sequence of dependent random variables at these times \(X_0, X_1, X_2, \ldots, X_n\) the random variable can take on any of the possible states with some probability.

Example: \(X_0=1\), take \(X_{t+1}=X_{t}+Y_t\) where \(Y_t\) are independent binomials with distribution B(10,p=0.5).

Thus we can summarize several steps of the Markov chain by using powers of the transition matrix, in general we write:

\[P^{(n)}_{i,j} =P(X_n=j\mid X_0=i)\]


\[P(X_{m+n}=j \vert X_m=i, X_{m-1}=i_{m-1},\ldots) = P(X_{m+n}=j\vert X_m=i) =P(X_n=j\vert X_0=i) = P^{(n)}_{i,j}\]

Questions: .#) Prove that if knowing both U and V doesn’t change the conditional probability then knowing U alone doesn’t change the conditional probability.

.#) Conclude that \(P(Y=y \vert X=x,U=u,V=v)=P(Y=y\vert X=x)\)
for any x , y , u , v . Then
\[P(Y=y \vert X=x,U=u) = P(Y=y\vert X=x)\]

Answer: Proof of claim \(A=\{X=x,U=u\}\) Then

\[\begin{split}\begin{aligned} P(Y=y &\vert& X=x,U=u)\\ & = &\frac{P(Y=y,A)}{P(A)}\\ &=& \frac{\sum_v P(Y=y,A,V=v)}{P(A)}\\ &=& \frac{\sum_v P(Y=y\vert A,V=v)P(A,V=v)}{P(A)}\\ &=& \frac{\sum_v P(Y=y\vert X=x)P(A,V=v)}{P(A)}\\ &=& \frac{P(Y=y\vert X=x)\sum_v P(A,V=v)}{P(A)}\\ & =& \frac{ P(Y=y\vert X=x)P(A)}{P(A)} \\ &=& P(Y=y\vert X=x)\end{aligned}\end{split}\]
\[\begin{split}\begin{aligned} P(X_{n+2}& =&k \vert X_n=i) \\ &= & \sum_j P(X_{n+2}=k ,X_{n+1}=j \vert X_n=i)\\ &= & \sum_j P(X_{n+2}=k \vert X_{n+1}=j , X_n=i)\\ \times P(X_{n+1}=j\vert X_n=i) \\& =& \sum_j {\bf P}_{i,j}{\bf P}_{j,k}\end{aligned}\end{split}\]

This shows that \(P(X_{n+2}=k\mid X_n=i) = ({\bf P}^2)_{i,k}\) where \({\bf P}^2\) means the matrix product \({\bf P}{\bf P}\) Notice both that the quantity does not depend on n and that we can compute it by taking a power of \({\bf P}\) . More general version \(P(X_{n+m}=k\mid X_n=j) = ({\bf P}^m)_{j,k}\) Since \({\bf P}^n{\bf P}^m={\bf P}^{n+m}\) we get the Chapman-Kolmogorov equations:

\[P(X_{n+m}=k\vert X_0=i) = \sum_j P(X_{n+m}=k\vert X_n=j)P(X_n=j\vert X_0=i)\]


A Markov Chain has stationary n step transition probabilities which are the n th power of the 1 step transition probabilities.

Here is R output for the 1, 2, 4, 8 and 16 step transition matrices for our weather example:

> P%*%P
     [,1] [,2]
[1,] 0.44 0.56
[2,] 0.28 0.72
> P2= P%*%P
> P4= P2%*%P2
> P8= P4%*%P4
> P16= P8%*%P8
> P16
          [,1]      [,2]
[1,] 0.3333336 0.6666664
[2,] 0.3333332 0.6666668

This computes the powers of P.


\[\begin{split}\lim_{n\to\infty} {\bf P}^n = \left[ \begin{array}{cc} \frac{1}{3} & \frac{2}{3} \\ \\ \frac{1}{3} & \frac{2}{3} \end{array}\right] \, .\end{split}\]

Suppose we toss a coin with a biased probability of heads \(P(H)=\alpha_D\) and start the chain with Dry if we get heads and Wet if we get tails. Then

\[\begin{split}P(X_0=x) = \left\{ \begin{array}{lr} \alpha_D & x=\mbox{Dry} \\ \alpha_W=1-\alpha_D & x=\mbox{Wet} \end{array} \right.\end{split}\]


\[P(X_1=x) = \sum_y P(X_1=x\vert X_0=y)P(X_0=y) = \sum_y \alpha_y P_{y,x} \, .\]

Notice the last line is a matrix multiplication of the row vector \(\alpha\) by matrix multiplication of the row vector \(\alpha\) by matrix \({\bf P}\) . A special \(\alpha\) : if we put \(\alpha_D=1/3\) and \(\alpha_W=2/3\) then

\[\begin{split}\left[ \frac{1}{3} \quad \frac{2}{3} \right] \left[\begin{array}{cc} \frac{3}{5} & \frac{2}{5}\\ \\ \frac{1}{5} & \frac{4}{5} \end{array} \right] = \left[ \frac{1}{3} \quad \frac{2}{3} \right] \, .\end{split}\]

\(P(X_0=D)=1/3\) then \(P(X_1=D)=1/3\) and analogously for W . This means that \({\it X}_{0}\) and \({\it X}_{1}\) have the same distribution. A probability vector \(\alpha\) is the initial distribution for the chain if :math:`P(X_0=i) = alpha_i. `

A Markov Chain is stationary if \(P(X_1=i) = P(X_0=i)\) for all i .

An initial distribution is called stationary if the chain is stationary. We find that \(\alpha\) is a stationary initial distribution if \(\alpha {\bf P}=\alpha \, .\)

Suppose \({\bf P}^n\) converges to some matrix \({\bf P}^\infty\) . Notice that \(\lim_{n\to\infty} {\bf P}^{n-1} = {\bf P}^\infty\) and

\[{\bf P}^\infty = \lim {\bf P}^n = \left[\lim {\bf P}^{n-1}\right] {\bf P} = {\bf P}^\infty {\bf P}\, .\]

This proves that each row \(\alpha\) of \({\bf P}^\infty\) satisfies \(\alpha = \alpha {\bf P}\, .\)

Definition: A row vector x is a left eigenvector of A with eigenvalue \(\lambda\) if \(xA=\lambda x \, .\) So each row of \({\bf P}^\infty\) is a left eigenvector of \({\bf P}\) with eigenvalue 1.