Least-squares approximate solution of overdetermined equations
Suppose $A\in {\mathbf R}^{m \times n}$ is skinny and full rank, $y \in {\mathbf R}^m$, and $x_\mathrm{ls} = (A^TA)^{-1}A^Ty$.
$Ax_\mathrm{ls}$ is the point in $\mathcal R(A)$ closest (in terms of norm) to $y$.
If $y \in \mathcal R (A)$, then $Ax_\mathrm{ls} =y$.
Suppose $y=Ax+v$, where $x\in {\mathbf R}^n$ is some set of parameters you wish to estimate, $y\in {\mathbf R}^m$ is a set of measurements, and $v$ represents a noise. We assume $m>n$, and $A$ is full rank. Consider an estimator of the form $\hat x=By$.
The choice $B = A^\dagger = (A^TA)^{-1}A^T$ yields $\hat x =x$, provided $v$ is small.
The choice $B = A^\dagger$ yields $\hat x$ that is closest to $x$.
The choice $B = A^\dagger$ yields $\hat x$ that minimizes the norm of $Ax-y$.
If $B$ is any left inverse of $A$, then for each $i,j$, $|B_{ij}| \geq |B^\dagger _{ij}|$.