Taylor approximation
The gradient captures all the information about how $g$ changes in
every direction, so we can use it to obtain a first-order approximation:
$g(a +h)\approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h$
Taylor approximation
The gradient captures all the information about how $f$ changes in
every direction, so we can use it to obtain a first-order approximation:
$g (a +h) \approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h$
The gradient $\nabla g$ is a row vector
and $h$
is a column vector.
Taylor approximation
The gradient captures all the information about how $f$ changes in
every direction, so we can use it to obtain a first-order approximation:
\begin{align}
g(a +h) &\approx g(a) + \color{var(--emphColor)}{\nabla g(a)} h \\
&= g(a) + \color{var(--emphColor)}{\frac{\partial {g}}{\partial{x_1}} (a)} h_1 +
\color{var(--emphColor)}{\frac{\partial {g}}{\partial{x_2}} (a)} h_2 +\cdots
+ \color{var(--emphColor)}{\frac{\partial{g}}{\partial{x_n}} (a)} h_n.
\end{align}
The gradient $\nabla g$ is a row vector
and $\color{var(--emphColor)}{h}$ is a column vector.
Vector-valued functions
Now we're ready to consider vector-valued
functions, or mappings $f: \mathbb{R}^{\color{var(--emphColor)}{n}} \to \mathbb{R}^{\color{var(--emphColor)}{n}}$.
We can think of $f$ as a “stack” of
$\color{var(--emphColor)}{n}$ scalar-valued functions
$f_1, f_2, \ldots, f_{\color{var(--emphColor)}{n}}$ taking
$\mathbb{R}^{\color{var(--emphColor)}{n}} \to \mathbb{R}$.
\begin{align}
f(x)=
\begin{bmatrix}
f_1 (x)\\
\vdots\\
f_{\color{var(--emphColor)}{n}} (x)
\end{bmatrix}.
\end{align}
Jacobian
Since $f$ is a “stack” of the scalar-valued functions
$f_1, f_2, \ldots, f_n$, and
we know
how to approximate these,
approximating $f$ amounts to approximating each $f_i$.
$$f (a + h) =
\begin{bmatrix}
f_1 (a + h)\\
\vdots\\
f_n (a + h)
\end{bmatrix}
\approx
\begin{bmatrix}
f_1 (a)\\
\vdots\\
f_n (a)
\end{bmatrix}
+
\begin{bmatrix}
\nabla{f_1} (a) h\\
\vdots\\
\nabla{f_n} (a) h
\end{bmatrix}$$
Jacobian
Since $f$ is a “stack” of the scalar-valued functions
$f_1, f_2, \ldots, f_n$, and
we know
how to approximate these,
approximating $f$ amounts to approximating each $f_i$.
$$f (a + h) =
\begin{bmatrix}
f_1 (a + h)\\
\vdots\\
f_n (a + h)
\end{bmatrix}
\approx
\begin{bmatrix}
f_1 (a)\\
\vdots\\
f_n (a)
\end{bmatrix}
+
\begin{bmatrix}
\nabla{f_1} (a) \\
\vdots\\
\nabla{f_n} (a)
\end{bmatrix} \color{var(--emphColor)}{h}.$$
Each $f_i$ has a gradient, and we need all these derivatives
in order to quantify how each coordinate of $f$ changes with
respect to each variable!
Jacobian
When $f: \mathbb{R}^n \to \mathbb{R}^n$, the
Jacobian $J_f$ is appropriate generalization
of the function derivative.
$J_f(x) =
\begin{bmatrix}
\nabla{f_1} (x) \\
\vdots\\
\nabla{f_n} (x)
\end{bmatrix}$
The Jacobian incorporates each gradient $\nabla{f_i}$ into a single matrix,
thereby capturing how each coordinate of
$f$ changes in every direction.
Jacobian
When $f: \mathbb{R}^n \to \mathbb{R}^n$, the
Jacobian $J_f$ is appropriate generalization
of the function derivative.
$\color{var(--emphColor)}{J_f} (x) =
\begin{bmatrix}
\frac{\partial{f_1}(x)}{\partial{x_1}} & \frac{\partial{f_1}(x)}{\partial{x_2}} & \cdots & \frac{\partial{f_1(x)}}{\partial{x_n}}\\
\frac{\partial{f_2}(x)}{\partial{x_1}}& \frac{\partial{f_2}(x)}{\partial{x_2}}& \cdots & \frac{\partial{f_2}(x)}{\partial{x_n}}\\
\vdots & \vdots &\ddots & \vdots\\
\frac{\partial{f_n}(x)}{\partial{x}}& \frac{\partial{f_n}(x)}{\partial{x_2}}& \cdots & \frac{\partial{f_n}(x)}{\partial{x_n}}
\end{bmatrix}$
The Jacobian incorporates each gradient $\nabla{f_i}$ into a single matrix,
thereby capturing how each coordinate of
$f$ changes in every direction.