Classical iterative schemes

GUIDING QUESTION: How can I compute a solution to a linear systems?

A new kind of algorithm

We'll develop a different class of solvers for linear systems, based on fixed point iteration.

Great for large, sparse systems! They can be very efficient, depending on matrix structure.

Motivation

When $A$ is sparse, Gaussian Elimination might introduce significant fill-in, nonzero entries in $L$ and $U$ where $A$ has zeroes, destroying matrix sparsity.

In some applications we have a good idea of what the solution to the system looks like. We should be able to use this as a “warm start” for our numerical methods.
In some cases (where the matrix is VERY large) we can't access the matrix directly, and we may only query an oracle evaluating matrix-vector products.

Consider a linear system $Ax = b$ with a square coefficient matrix.

Guiding philosophy: Instead of solving all the equations once simultaneously, let's solve the $i$th equation for the $i$th variable over and over.

One at a time

When $A$ is $n\times n$, the $\color{var(--emphColor)}{i}$th component of the system $Ax = b$ reads $$a_{\color{var(--emphColor)}{i}1} x_1 + \cdots + a_{\color{var(--emphColor)}{ii}} x_{\color{var(--emphColor)}{i}} + \cdots + a_{\color{var(--emphColor)}{i}n} x_n = b_{\color{var(--emphColor)}{i}}.$$

Assuming $a_{\color{var(--emphColor)}{ii}} \neq 0$, we can solve for $x_{\color{var(--emphColor)}{i}}$ to obtain $$x_{\color{var(--emphColor)}{i}} = \frac{1}{a_{\color{var(--emphColor)}{ii}}} \bigg(b_{\color{var(--emphColor)}{i}} - \sum _{j \neq \color{var(--emphColor)}{i}} a_{\color{var(--emphColor)}{i}j} x_j \bigg).$$
This is the basis of our iterative method!

Jacobi iteration

Given a starting vector $x^{(0)}$,

we update the components $x_1^{(\color{var(--emphColor)}{k})}, \ldots, x_n^{(\color{var(--emphColor)}{k})}$ of our $\color{var(--emphColor)}{k}$th approximation $x^{(\color{var(--emphColor)}{k})}$ one by one: $$x_i^{(\color{var(--emphColor)}{k+1})} = \frac{1}{a_{ii}} (b_i- \sum _{j\not= i} a_{ij} x^{(\color{var(--emphColor)}{k})}_j).$$
The update $x_i^{(\color{var(--emphColor)}{k+1})}$ uses almost every component of the current approximation $x^{(\color{var(--emphColor)}{k})}$, so in practice we cannot simply overwrite our current approximation vector and must maintain two storage arrays.

The components of the $(\color{var(--emphColor)}{k+1})$th approximation $x^{(\color{var(--emphColor)}{k+1})}$ depend only on the $\color{var(--emphColor)}{k}$th approximation $x^{(\color{var(--emphColor)}{k})}$, and not on each other, so they can in fact be computed in parallel.
The Jacobi iteration is easily parallelizable: each component of the approximation is assigned a computing node and the nodes perform updates in parallel independently of one another.

Why should this work?

The intuition is so simple, it is perhaps surprising that the Jacobi iteration works well in practice.

The Jacobi iteration is a prime example of a splitting method, which is based on fixed point iteration.
If $A$ is an $n \times n$ matrix, then a pair of matrices $M, N$ such that $M$ is invertible and $A= M -N$ is called a splitting of $A$.

Splittings and fixed points

What does a splitting of $A$ have to do with fixed point iteration?
If $(M, N)$ is a splitting of $A$, then $Ax = b$ implies
$(M - N) x = b$ and $Mx = Nx + b$.
Since $M$ is invertible, we have $$\color{var(--emphColor)}{x} = M ^{-1} N\color{var(--emphColor)}{x} + M^{-1} b.$$

Splittings and fixed points

If $(M, N)$ is a splitting of $A$, then $Ax = b$ implies
$$\color{var(--emphColor)}{x} = M ^{-1} N\color{var(--emphColor)}{x} + M^{-1} b.$$
So the solution to $Ax = b$ is a fixed point of the function $g(\color{var(--emphColor)}{x}) = T\color{var(--emphColor)}{x} + c$, with $T = M^{-1}N$ and $c = M^{-1}b$.

Fixed point iteration

Inspired by our earlier discussion of fixed point iteration schemes, we approximate the fixed point satisfying $\color{var(--emphColor)}{x} = T\color{var(--emphColor)}{x} + c$ using the iterates defined by $$x^{(\color{var(--emphColor)}{k+1})} = T x^{(\color{var(--emphColor)}{k})} + c.$$

We start with an initial guess $x^{(0)}$ and we stop when the distance between consecutive iterates is smaller than a specified tolerance.

Convergence

Suppose $x^*$ solves the fixed point problem and denote the error in the $k$th approximation by $e^{(k)} = x^{(k)} - x^*$.

Since $\color{var(--emphColor)}{x^*}$ is a fixed point of $g(x) = Tx + c$, the error at the $(k+1)$th apprximation is given by \begin{align} e^{(k+1)} = x^{(k+1)} - \color{var(--emphColor)}{x^*} &= (Tx^{(k)} + c) - (T\color{var(--emphColor)}{x^*} + c)\\ & = T (x^{(k)} - x^*)\\ & = T e^{(k)}. \end{align}

Convergence

The relation $e^{(k+1)} = T e^{(k)}$ holds for any $k$, so \begin{align} e^{(k+1)} &= T e^{(k)} = T^2 e^{(k-1)} = \cdots = T^{k+1} e^{(0)}. \end{align}
If $T$ were a scalar, we would need $|T| < 1$ to guarantee the convergence $e^{(k+1)} \to 0$.
In general, it is enough to guarantee that $|\lambda(T)| < 1$, with $\lambda(T)$ denoting largest eigenvalue (in magnitude) of $T$.
As I mentioned before, we'll define eignevalues precisely in HW4. They will not play a crucial role in our course, so if you haven't studied them, don't stress over this fine point.

The following remarks are highly technical and are considered slightly beyond the scope of this course.

They are provided for the benefit of the curious reader!
(Technical) The quantity $|\lambda(T)|$ is called the spectral radius of $T$. The spectral radius defines a submultiplicative matrix norm with the property that $||Ax||_2 \leq ||A|| ||x||_2$ for any vector $x$, where $||x||_2 = \sqrt{\sum_{j=1}^n x_j^2}$ denotes the usual Euclidean norm.

Since the spectral norm is submultiplicative and $e^{(k+1)} = T^{k+1}e^{(0)}$, the norm of the error at the $(\color{var(--emphColor)}{k+1})$th step is bounded by $$||e^{(\color{var(--emphColor)}{k+1})}||_2 = ||T^{k+1}e^{(0)}|| \leq ||T||^{\color{var(--emphColor)}{k+1}}||e^{(0)}||_2.$$

Therefore $||e^{(\color{var(--emphColor)}{k+1})}||_2 \to 0$ if $||T|| = |\lambda(T)| < 1$, meaning that the fixed point iteration converges if the spectral radius of $T$ is strictly less than $1$.

Jacobi iteration as a splitting method

For any square $A$, set $A= D \color{var(--emphColor)}{+} L \color{var(--emphColor)}{+} U$, with $D$ the diagonal part, $\color{var(--emphColor)}{+}L$ the striclty lower triangular part, $\color{var(--emphColor)}{+}U$ the striclty upper triangular part.

I've highlighted the signs because my convention here differs from Bradie's, which I find slightly confusing.

Jacobi iteration as a splitting method

If $A= \begin{bmatrix} \bbox[3pt, border: 3pt solid var(--emphColor)]{5} & 1 & 2\\ -3 & \bbox[3pt, border: 3pt solid var(--emphColor)]{9} & 4\\ 1 & 2 & \bbox[3pt, border: 3pt solid var(--emphColor)]{-7} \end{bmatrix}$, then
$L= \begin{bmatrix} 0\\ -3 & 0\\ 1 & 2 & 0 \end{bmatrix}, \quad D= \begin{bmatrix} \color{var(--emphColor)}{5}\\ && \color{var(--emphColor)}{9}\\ &&& \color{var(--emphColor)}{-7} \end{bmatrix}, \text{ and } U= \begin{bmatrix} 0 & 1 & 2\\ & 0 & 4\\ && 0 \end{bmatrix}.$

Jacobi iteration as a splitting method

Suppose $A$ has nonzero diagonal entries and set $M=D$ and $N=-(L+U)$.

This splitting suggests the update $x^{(\color{var(--emphColor)}{k+1})} = - (D^{-1}(L+U)) x^{(\color{var(--emphColor)}{k})} + D ^{-1} b.$

If $a_{ii} = d_{ii} = 0$ for some $i$, it may be possible to proceed by first swapping rows of $A$.
What does this update look like for each component?

$x^{(\color{var(--emphColor)}{k+1})} = - (D^{-1}(L+U)) x^{(\color{var(--emphColor)}{k})} + D ^{-1} b$

What does this update look like for each component?
Since $D^{-1} = \mathrm{diag}(d^{-1}_{11}, \ldots, d^{-1}_{nn})$,
$x^{(\color{var(--emphColor)}{k+1})} = - \begin{bmatrix} d^{-1}_{11}\\ & d^{-1}_{22}\\ &&\ddots\\ &&&d^{-1}_{nn} \end{bmatrix} \begin{bmatrix} 0 & a_{12} & \cdots & a_{1n}\\ a_{21} & 0 & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & 0 \end{bmatrix} \begin{bmatrix} x^{(\color{var(--emphColor)}{k})}_1\\ x^{(\color{var(--emphColor)}{k})}_2\\ \vdots\\ x^{(\color{var(--emphColor)}{k})}_n \end{bmatrix} + \begin{bmatrix} d^{-1}_{11}\\ & d^{-1}_{22}\\ &&\ddots\\ &&&d^{-1}_{nn} \end{bmatrix} \begin{bmatrix} b_1\\ b_2\\ \vdots\\ b_n \end{bmatrix}$

$x^{(\color{var(--emphColor)}{k+1})} = - (D^{-1}(L+U)) x^{(\color{var(--emphColor)}{k})} + D ^{-1} b$

What does this update look like for each component?
Since $D^{-1} = \mathrm{diag}(d^{-1}_{11}, \ldots, d^{-1}_{nn})$,
$x^{(\color{var(--emphColor)}{k+1})} = - \begin{bmatrix} d^{-1}_{11} \sum_{j \not=1} a_{1j} x^{(\color{var(--emphColor)}{k})}_j\\ d^{-1}_{22} \sum_{j \not=2} a_{2j} x^{(\color{var(--emphColor)}{k})}_j\\ \vdots\\ d^{-1}_{nn} \sum_{j \not=n} a_{nj} x^{(\color{var(--emphColor)}{k})}_j \end{bmatrix} + \begin{bmatrix} d^{-1}_{11} b_1\\ d^{-1}_{22} b_2\\ \vdots\\ d^{-1}_{nn} b_n \end{bmatrix}$

$x^{(\color{var(--emphColor)}{k+1})} = - (D^{-1}(L+U)) x^{(\color{var(--emphColor)}{k})} + D ^{-1} b$

What does this update look like for each component?
Since $D^{-1} = \mathrm{diag}(d^{-1}_{11}, \ldots, d^{-1}_{nn})$,
$x^{(\color{var(--emphColor)}{k+1})} = \begin{bmatrix} \vert \\ \frac{1}{a_{ii}} (b_i- \sum _{j\not= i} a_{ij} x^{(\color{var(--emphColor)}{k})}_j)\\ \vert \end{bmatrix}.$
This is the Jacobi update we wrote down earlier!

Jacobi iteration

So the Jacobi update, which we obtained by solving the $i$th equation for the $i$th variable, approximates a solution to $Ax = b$ by translating the root finding problem into a fixed point problem using the splitting $(M, N) = (D, -(L+U))$ of $A$.

Can we do better?

If we loop through the components of $x^{(\color{var(--emphColor)}{k+1})}$ in order when performing the Jacobi update, $$x_i^{(\color{var(--emphColor)}{k+1})} = \frac{1}{a_{ii}} (b_i- \sum _{j\not= i} a_{ij} x^{(\color{var(--emphColor)}{k})}_j),$$

$$\\\\$$

Can we do better?

If loop through the components of $x^{(\color{var(--emphColor)}{k+1})}$ when performing the Jacobi update, $$x_i^{(\color{var(--emphColor)}{k+1})} = \frac{1}{a_{ii}} (b_i- \sum _{j = \color{var(--emphColor)}{1}}^{\color{var(--emphColor)}{i-1}} a_{ij} x^{(\color{var(--emphColor)}{k})}_{\color{var(--emphColor)}{j}} + \sum _{j= i + 1}^n a_{ij} x^{(\color{var(--emphColor)}{k})}_j),$$

we'll know $x_{\color{var(--emphColor)}{1}}^{(\color{var(--emphColor)}{k+1})}, x_{\color{var(--emphColor)}{2}}^{(\color{var(--emphColor)}{k+1})}, \ldots, x_{\color{var(--emphColor)}{i-1}}^{(\color{var(--emphColor)}{k+1})}$, by the time we need to update $x_i^{(\color{var(--emphColor)}{k+1})}$.

Gauss-Seidel

We hope that $x_i^{(\color{var(--emphColor)}{k+1})}$ is a better approximation than $x_i^{(\color{var(--emphColor)}{k})}$, so we might be better off by using most updated information when updating $x_i^{(\color{var(--emphColor)}{k+1})}$.

The Gauss-Seidel method suggests we update $$x_i^{(\color{var(--emphColor)}{k+1})} = \frac{1}{a_{ii}} (b_i- \sum _{j = 1}^{i-1} a_{ij} x^{(\color{var(--emphColor)}{k+1})}_{j} + \sum _{j= i + 1}^n a_{ij} x^{(\color{var(--emphColor)}{k})}_j).$$
It's almost exaclty the same as the Jacobi update, except that we always use the latest updates available here.

In practice, we can achieve this by looping through the components of our current update vector in order of increasing index and overwriting the components one at a time.
The Gauss-Seidel method requires we update $x_{\color{var(--emphColor)}{1}}^{(k+1)}, x_{\color{var(--emphColor)}{2}}^{(k+1)}, \ldots, x_{\color{var(--emphColor)}{i-1}}^{(k+1)}$, before updating $x_{\color{var(--emphColor)}{i}}^{(k+1)}$.
The updates must be computed sequentially and therefore this method cannot be parallelized so easily.
In some cases when the matrix is sparse, we may parallelize some updates by efficiently coloring a dependency graph.

As a splitting method

The avid reader will verify that the Gauss-Seidel iteration is based on the splitting $M = L + D$ and $N = - U$.

Recall that splitting methods are based on the update $Mx^{(\color{var(--emphColor)}{k+1})} = N x^{(\color{var(--emphColor)}{k})} + b$.
In this case $M$ is lower triangular, so it's better to use forward substitution to isolate than to think about explicitly inverting $M$.

For the Gauss-Seidel method we also require that $A$ has nonzero diagonal entries.
Again, it may be possible to proceed by first swapping some rows of $A$.

Jacobi and GS convergence

It's hard to say when exactly these methods work, but we can make some guarantees.

If $A$ is SDD, both the Jacobi and the Gauss-Seidel iteration schemes converge to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
$\quad$

Jacobi and GS convergence

It's hard to say when exactly these methods work, but we can make some guarantees.

If $A$ is SDD, both the Jacobi and the Gauss-Seidel iteration schemes converge to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
When $A$ is SDD, $A$ is invertible.

Jacobi and GS convergence

It's hard to say when exactly these methods work, but we can make some guarantees.

If $A$ is SDD, both the Jacobi and the Gauss-Seidel iteration schemes converge to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
When $A$ is SDD, any initial guess works!

GS convergence

We can make a similar guarantee for Gauss-Seidel iteration.

If $A$ is SPD, the Gauss-Seidel iteration schemes converges to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
$\quad $

GS convergence

We can make a similar guarantee for Gauss-Seidel iteration.

If $A$ is SPD, the Gauss-Seidel iteration schemes converges to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
When $A$ is SPD, $A$ is invertible.

GS convergence

We can make a similar guarantee for Gauss-Seidel iteration.

If $A$ is SPD, the Gauss-Seidel iteration schemes converges to the unique root of $Ax = b$, for any starting point $x^{(0)}$.
When $A$ is SPD, any initial guess works!

Jacobi vs Gauss-Seidel

Typically, Gauss-Seidel will be faster in practice (with respect to the number of iterations required for convergence, not accounting for gains due to parallelizing the Jacobi iteration).

Congratulations! You reached the end of this lecture.