In the one-variable case, linearization is based on the Taylor approximation.

The same is true more generally, so we must first extend the Taylor approximation to functions $f: \mathbb{R}^n \to \mathbb{R}^n$.
To do so, we'll need partial derivatives.

For full generality, we could take $f: \mathbb{R}^n \to \mathbb{R}^{\color{var(--emphColor)}{m}}$ with $n \neq \color{var(--emphColor)}{m}$ but this would lead to rectangular systems, and we haven't developed a theory for solving them.

Scalar-valued functions

We begin by considering scalar-valued functions of multiple variables.

That is, we first consider $$g: \mathbb{R}^n \to \mathbb{R}.$$

Partial derivatives

The partial derivative with respect to $x_i$ measures the rate of change of the function in the direction of the $i$th coordinate.

In practice, differentiating $g: \mathbb{R}^n \to \mathbb{R}$ with respect to $x_i$ amounts to treating every other variable as a constant and differentiating $g$ as if it were a function of the single variable $x_i$.

If $g(x_1,x_2,x_3)= x^3_1 - 2x_2 - 2$,
then $$\nabla g(x_1, x_2, x_3) = \begin{bmatrix} \frac{\partial g}{\partial x_{\color{var(--emphColor)}{1}}} & \frac{\partial g}{\partial x_{\color{var(--emphColor)}{2}}} & \frac{\partial g}{\partial x_{\color{var(--emphColor)}{3}}} \end{bmatrix} = \begin{bmatrix} 3x^2 &-2 & 0 \end{bmatrix}.$$