Newton's method (systems) 1/3

Taylor approximation

Compactly, the first-order approximation of $f: \mathbb{R}^n \to \mathbb{R}^n$ is given by $$f(a + h) \approx f(a) + \color{var(--emphColor)}{J_f}(a) h.$$

There are analogues for higher order derivatives but these involve tensors of higher rank.

For instance, each row $f_{\color{var(--emphColor)}{i}}$ of $f$ has $n^2$ second-order partials $\frac{\partial^2}{\partial{x_{\color{var(--emphColor)}{j}}} \partial {x_\color{var(--emphColor)}{k}}} f_{\color{var(--emphColor)}{i}}$.

There are three ($\color{var(--emphColor)}{i}, \color{var(--emphColor)}{j}, \color{var(--emphColor)}{k}$) indices here. What does this array look like?

Newton's method

Our method is based on this approximation.

Given a root estimate $x^{(\color{var(--emphColor)}{k})}$, we first solve the linear system $$0 = f(x^{(\color{var(--emphColor)}{k})}) + J_f(x^{(\color{var(--emphColor)}{k})})h$$ and then set $$x^{(\color{var(--emphColor)}{k+1})} = x^{(\color{var(--emphColor)}{k})} + h.$$

Pseudocode

							\begin{algorithm}
							\caption{Newton's method for systems}
							\begin{algorithmic}
							\REQUIRE{Initial guess $x^{(0)}$}
							\FOR{$k \gets 1$ to $\texttt{max_iter}$}
							 \STATE{solve $J_f(x^{(k)})h = -f(x^{(k)})$ for $h$}
							 \STATE{$x^{(k+1)} \gets x^{(k)} + h$}
							 \IF{stopping criterion is met}
							 \STATE{break}
							 \ENDIF
							\ENDFOR
							\end{algorithmic}
							\end{algorithm}

Pseudocode

							\begin{algorithm}
							\caption{Newton's method for systems}
							\begin{algorithmic}
							\REQUIRE{$\color{var(--emphColor)}{\text{Initial guess } x^{(0)}}$}
							\FOR{$k \gets 1$ to $\texttt{max_iter}$}
							 \STATE{solve $J_f(x^{(k)})h = -f(x^{(k)})$ for $h$}
							 \STATE{$x^{(k+1)} \gets x^{(k)} + h$}
							 \IF{stopping criterion is met}
							 \STATE{break}
							 \ENDIF
							\ENDFOR
							\end{algorithmic}
							\end{algorithm}

As before , we must choose a starting point and the convergence of our algorithm depends on this choice.

Pseudocode

							\begin{algorithm}
							\caption{Newton's method for systems}
							\begin{algorithmic}
							\REQUIRE{Initial guess $x^{(0)}$}
							\FOR{$\color{var(--emphColor)}{k \gets 1 \text{ to } \texttt{max_iter}}$}
							 \STATE{solve $J_f(x^{(k)})h = -f(x^{(k)})$ for $h$}
							 \STATE{$x^{(k+1)} \gets x^{(k)} + h$}
							 \IF{stopping criterion is met}
							 \STATE{break}
							 \ENDIF
							\ENDFOR
							\end{algorithmic}
							\end{algorithm}

Again , we prefer a $\texttt{for}$ loop because this ensures our loop terminates.

Pseudocode

							\begin{algorithm}
							\caption{Newton's method for systems}
							\begin{algorithmic}
							\REQUIRE{Initial guess $x^{(0)}$}
							\FOR{$k \gets 1$ to $\texttt{max_iter}$}
							 \STATE{solve $\color{var(--emphColor)}{J_f(x^{(k)})h = -f(x^{(k)})}$ for $h$}
							 \STATE{$x^{(k+1)} \gets x^{(k)} + h$}
							 \IF{stopping criterion is met}
							 \STATE{break}
							 \ENDIF
							\ENDFOR
							\end{algorithmic}
							\end{algorithm}

This is the key. As promised, we reduced the non-linear root finding problem to a sequence of linear systems.

Pseudocode

							\begin{algorithm}
							\caption{Newton's method for systems}
							\begin{algorithmic}
							\REQUIRE{Initial guess $x^{(0)}$}
							\FOR{$k \gets 1$ to $\texttt{max_iter}$}
							 \STATE{solve $J_f(x^{(k)})h = -f(x^{(k)})$ for $h$}
							 \STATE{$x^{(k+1)} \gets x^{(k)} + h$}
							 \IF{$\color{var(--emphColor)}{\text{stopping criterion is met}}$}
							 \STATE{break}
							 \ENDIF
							\ENDFOR
							\end{algorithmic}
							\end{algorithm}

As in the one-variable setting , there are a few criteria we could use here.

Stopping criteria

There are a few options here. If $\epsilon_{\mathrm{tol}}$ denotes your specified tolerance, we could try:

$\max_{1\leq i \leq n} |f_i(x)| < \epsilon_{\mathrm{tol}},$

$||f(x)|| < \epsilon_{\mathrm{tol}},$

$\max_{1\leq i \leq n} |h_i| <\epsilon_{\mathrm{tol}},$

$||h|| <\epsilon_{\mathrm{tol}}.$

Recall our discussion of stopping criteria. Similar comments apply here.

New vs. old

When the Jacobian $J_f(x^{(k)})$ is invertible, the update $h=-J_f(x^{(k)})^{-1} f(x^{(k)})$, so $$x^{(k+1)} = x^{(k)} - J_f(x^{(k)})^{-1} f(x^{(k)}).$$

If $f$ is a function of a single variable, its Jacobian is simply $f'(x)$, so we obtain exactly the same update formula we wrote down earlier!

Let's find $x_1, x_2, x_3$ satisfying \begin{align} x^3_1 -2x_2 -2 & = 0,\\ x^3_1 -5x^2_3 + 7 & = 0,\\ x_2x_3^2 -1 & = 0. \end{align}

We cast this as a root finding problem for a function $f: \mathbb{R}^3 \to \mathbb{R}^3$.

First we define \begin{align} f(x_1,x_2,x_3)= \begin{bmatrix} f_1(x_1,x_2,x_3)\\ f_2(x_1,x_2,x_3)\\ f_3(x_1,x_2,x_3) \end{bmatrix} = \begin{bmatrix} x_1^3 -2x_2 -2\\ x^3_1 -5x_3^2 + 7\\ x_2x_3^2 -1 \end{bmatrix}. \end{align}

Next, we compute the Jacobian \begin{align} J_f(x)= \begin{bmatrix} \nabla{f_1}(x)\\ \nabla{f_2}(x)\\ \nabla{f_3}(x) \end{bmatrix} = \begin{bmatrix} 3x^2_1 &-2 &0\\ 3x^2_1 &0 &-10x^2_3\\ 0 &x^2_3 & 2x_3x_2 \end{bmatrix}. \end{align}

With $f$ and $J_f$ in mind, we may proceed to the loop...

This task is perhaps best suited for a computer.

MATLAB code snippet


				f = @(x) [x(1)^3 - 2 * x(2) - 2; ...
					x(1)^3 - 5 * x(3)^2 + 7; ...
					x(2) * x(3)^2 -1];
				J = @(x) [3 * x(1)^2,	-2,  0; ...
					3 * x(1)^2, 0, -10 * x(3)^2; ...
					0, x(3)^2, 2 * x(3) * x(2)];
				for k = 1:N
					f_val = f(x);
					J_val = J(x);
					h = -J_val \ f_val;
					x = x + h
					if norm(h) < eps
						break
					end
				end

MATLAB code snippet


				f = @(x) [x(1)^3 - 2 * x(2) - 2; ...
					x(1)^3 - 5 * x(3)^2 + 7; ...
					x(2) * x(3)^2 -1];
				J = @(x) [3 * x(1)^2,	-2,  0; ...
					3 * x(1)^2, 0, -10 * x(3)^2; ...
					0, x(3)^2, 2 * x(3) * x(2)];
				for k = 1:N
					f_val = f(x);
					J_val = J(x);
					h = -J_val \ f_val;
					x = x + h
					if norm(h) < eps
						break
					end
				end

We use “lambda” or “anonymous” functions to implement our function and its Jacobian.

MATLAB code snippet


			f = @(x) [x(1)^3 - 2 * x(2) - 2; ...
				x(1)^3 - 5 * x(3)^2 + 7; ...
				x(2) * x(3)^2 -1];
			J = @(x) [3 * x(1)^2,	-2,  0; ...
				3 * x(1)^2, 0, -10 * x(3)^2; ...
				0, x(3)^2, 2 * x(3) * x(2)];
			for k = 1:N
				f_val = f(x);
				J_val = J(x);
				h = -J_val \ f_val;
				x = x + h
				if norm(h) < eps
					break
				end
			end

In MATLAB, we use the backslash operator to solve linear systems.

MATLAB code snippet


				f = @(x) [x(1)^3 - 2 * x(2) - 2; ...
					x(1)^3 - 5 * x(3)^2 + 7; ...
					x(2) * x(3)^2 -1];
				J = @(x) [3 * x(1)^2,	-2,  0; ...
					3 * x(1)^2, 0, -10 * x(3)^2; ...
					0, x(3)^2, 2 * x(3) * x(2)];
				for k = 1:N
					f_val = f(x);
					J_val = J(x);
					h = -J_val \ f_val;
					x = x + h
					if norm(h) < eps
						break
					end
				end

We use the MATLAB function $\texttt{norm}$ to implement the stopping criterion $||h|| < \epsilon$.

Conic intersections

Congratulations! You reached the end of this lecture.

Take me home