Taylor approximation

The gradient captures all the information about how $g$ changes in every direction, so we can use it to obtain a first-order approximation:

$g(a +h)\approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h$

Taylor approximation

The gradient captures all the information about how $f$ changes in every direction, so we can use it to obtain a first-order approximation:

$g (a +h) \approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h$

The gradient $\nabla g$ is a row vector and $h$ is a column vector.

Taylor approximation

The gradient captures all the information about how $f$ changes in every direction, so we can use it to obtain a first-order approximation:

\begin{align} g(a +h) &\approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h \\ &= g(a) + \color{var(--emphColor)}{\frac{\partial {g}}{\partial{x_1}} (a)} h_1 + \color{var(--emphColor)}{\frac{\partial {g}}{\partial{x_2}} (a)} h_2 +\cdots + \color{var(--emphColor)}{\frac{\partial{g}}{\partial{x_n}} (a)} h_n. \end{align}

The gradient $\nabla g$ is a row vector and $\color{var(--emphColor)}{h}$ is a column vector.

Vector-valued functions

Now we're ready to consider vector-valued functions, or mappings $f: \mathbb{R}^{\color{var(--emphColor)}{n}} \to \mathbb{R}^{\color{var(--emphColor)}{n}}$.

We can think of $f$ as a “stack” of $\color{var(--emphColor)}{n}$ scalar-valued functions $f_1, f_2, \ldots, f_{\color{var(--emphColor)}{n}}$ taking $\mathbb{R}^{\color{var(--emphColor)}{n}} \to \mathbb{R}$.
\begin{align} f(x)= \begin{bmatrix} f_1 (x)\\ \vdots\\ f_{\color{var(--emphColor)}{n}} (x) \end{bmatrix}. \end{align}

Jacobian

Since $f$ is a “stack” of the scalar-valued functions $f_1, f_2, \ldots, f_n$, and we know how to approximate these, approximating $f$ amounts to approximating each $f_i$.

$$f (a + h) = \begin{bmatrix} f_1 (a + h)\\ \vdots\\ f_n (a + h) \end{bmatrix} \approx \begin{bmatrix} f_1 (a)\\ \vdots\\ f_n (a) \end{bmatrix} + \begin{bmatrix} \nabla{f_1} (a) h\\ \vdots\\ \nabla{f_n} (a) h \end{bmatrix}$$

Jacobian

Since $f$ is a “stack” of the scalar-valued functions $f_1, f_2, \ldots, f_n$, and we know how to approximate these, approximating $f$ amounts to approximating each $f_i$.

$$f (a + h) = \begin{bmatrix} f_1 (a + h)\\ \vdots\\ f_n (a + h) \end{bmatrix} \approx \begin{bmatrix} f_1 (a)\\ \vdots\\ f_n (a) \end{bmatrix} + \begin{bmatrix} \nabla{f_1} (a) \\ \vdots\\ \nabla{f_n} (a) \end{bmatrix} \color{var(--emphColor)}{h}.$$

Each $f_i$ has a gradient, and we need all these derivatives in order to quantify how each coordinate of $f$ changes with respect to each variable!

Jacobian

When $f: \mathbb{R}^n \to \mathbb{R}^n$, the Jacobian $J_f$ is appropriate generalization of the function derivative.

$J_f(x) = \begin{bmatrix} \nabla{f_1} (x) \\ \vdots\\ \nabla{f_n} (x) \end{bmatrix}$

The Jacobian incorporates each gradient $\nabla{f_i}$ into a single matrix, thereby capturing how each coordinate of $f$ changes in every direction.

Jacobian

When $f: \mathbb{R}^n \to \mathbb{R}^n$, the Jacobian $J_f$ is appropriate generalization of the function derivative.

$\color{var(--emphColor)}{J_f} (x) = \begin{bmatrix} \frac{\partial{f_1}(x)}{\partial{x_1}} & \frac{\partial{f_1}(x)}{\partial{x_2}} & \cdots & \frac{\partial{f_1(x)}}{\partial{x_n}}\\ \frac{\partial{f_2}(x)}{\partial{x_1}}& \frac{\partial{f_2}(x)}{\partial{x_2}}& \cdots & \frac{\partial{f_2}(x)}{\partial{x_n}}\\ \vdots & \vdots &\ddots & \vdots\\ \frac{\partial{f_n}(x)}{\partial{x}}& \frac{\partial{f_n}(x)}{\partial{x_2}}& \cdots & \frac{\partial{f_n}(x)}{\partial{x_n}} \end{bmatrix}$

The Jacobian incorporates each gradient $\nabla{f_i}$ into a single matrix, thereby capturing how each coordinate of $f$ changes in every direction.