Splines#

Cubic splines#

Define a set of knots $\xi_1< \xi_2 < \dots<\xi_K$.
We want the function $f$ in $Y= f(X) + \epsilon$ to:
1. Be a cubic polynomial between every pair of knots $\xi_i,\xi_{i+1}$.
2. Be continuous at each knot.
3. Have continuous first and second derivatives at each knot.
It turns out, we can write $f$ in terms of $K+3$ basis functions:

\[f(X) = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \beta_4 h(X,\xi_1) + \dots + \beta_{K+3} h(X,\xi_K)\]

Above,

\[\begin{split} h(x,\xi) = \begin{cases} (x-\xi)^3 \quad \text{if }x>\xi \\ 0 \quad \text{otherwise} \end{cases} \end{split}\]

Natural cubic splines#

Spline which is linear instead of cubic for $X<\xi_1$, $X>\xi_K$.
The predictions are more stable for extreme values of $X$.

Choosing the number and locations of knots#

The locations of the knots are typically quantiles of $X$.
The number of knots, $K$, is chosen by cross validation.

Natural cubic splines vs. polynomial regression#

Splines can fit complex functions with few parameters.
Polynomials require high degree terms to be flexible.
High-degree polynomials can be unstable at the edges.

Smoothing splines#

Find the function $f$ which minimizes

\[\color{Blue}{\sum_{i=1}^n (y_i - f(x_i))^2} + \color{Red}{ \lambda \int f''(x)^2 dx}\]

The RSS when using $f$ to predict.
A penalty for the roughness of the function.

Facts#

The minimizer $\hat f$ is a natural cubic spline, with knots at each sample point $x_1,\dots,x_n$.
Obtaining $\hat f$ is similar to a Ridge regression.

Advanced: deriving a smoothing spline#

Show that if you fix the values $f(x_1),\dots,f(x_2)$, the roughness

\[\int f''(x)^2 dx\]

is minimized by a natural cubic spline.

Deduce that the solution to the smoothing spline problem is a natural cubic spline, which can be written in terms of its basis functions.

\[f(x) = \beta_0 + \beta_1 f_1(x) + \dots \beta_{n+3} f_{n+3}(x)\]

Letting $\mathbf N$ be a matrix with $\mathbf N(i,j) = f_j(x_i)$, we can write the objective function:

\[ (y - \mathbf N\beta)^T(y - \mathbf N\beta) + \lambda \beta^T \Omega_{\mathbf N}\beta, \]

where $\Omega_{\mathbf N}(i,j) = \int N_i''(t) N_j''(t) dt$.

By simple calculus, the coefficients $\hat \beta$ which minimize

\[ (y - \mathbf N\beta)^T(y - \mathbf N\beta) + \lambda \beta^T \Omega_{\mathbf N}\beta, \]

are $\hat \beta = (\mathbf N^T \mathbf N + \lambda \Omega_{\mathbf N})^{-1} \mathbf N^T y$.

Note that the predicted values are a linear function of the observed values:

\[\hat y = \underbrace{\mathbf N (\mathbf N^T \mathbf N + \lambda \Omega_{\mathbf N})^{-1} \mathbf N^T}_{\mathbf S_\lambda} y\]

Degrees of freedom#

The degrees of freedom for a smoothing spline are:

\[\text{Trace}(\mathbf S_\lambda)= \mathbf S_\lambda(1,1) + \mathbf S_\lambda(2,2) + \cdots + \mathbf S_\lambda(n,n) \]

Natural cubic splines vs. Smoothing splines#

Natural cubic splines	Smoothing splines
Fix the locations of $K$ knots at quantiles of $X$ and number of knots $K<n$.	Put $n$ knots at $x_1,\dots,x_n$.
Find the natural cubic spline $\hat f$ which minimizes the RSS:$\sum_{i=1}^n (y_i - f(x_i) )^2$ with these knots.	Find the fitted values $\hat f(x_1),\dots,\hat f(x_n)$ through an algorithm similar to Ridge regression.
Choose $K$ by cross validation.	Choose smoothing parameter $\lambda$ by cross validation.

Choosing the regularization parameter $\lambda$#

We typically choose $\lambda$ through cross validation.
Fortunately, we can solve the problem for any $\lambda$ with the same complexity of diagonalizing an $n\times n$ matrix.
There is a shortcut for LOOCV: $$ \begin{aligned} RSS_\text{loocv}(\lambda) &= \sum_{i=1}^n (y_i - \hat f_\lambda^{(-i)}(x_i))^2 \\ &= \sum_{i=1}^n \left[\frac{y_i-\hat f_\lambda(x_i)}{1-\mathbf S_\lambda(i,i)}\right]^2 \end{aligned} $$

Natural cubic splines	Smoothing splines
Fix the locations of \(K\) knots at quantiles of \(X\) and number of knots \(K<n\).	Put \(n\) knots at \(x_1,\dots,x_n\).
Find the natural cubic spline \(\hat f\) which minimizes the RSS:\(\sum_{i=1}^n (y_i - f(x_i) )^2\) with these knots.	Find the fitted values \(\hat f(x_1),\dots,\hat f(x_n)\) through an algorithm similar to Ridge regression.
Choose \(K\) by cross validation.	Choose smoothing parameter \(\lambda\) by cross validation.

STATS 202

Splines

Contents