
Stats 203
Three quantities that measure variability in the data:
\[ \begin{align} \text{SST} &= \sum_{i=1}^n (Y_i - \bar{Y})^2 & &\text{(Total Sum of Squares)} \\[6pt] \text{SSM} &= \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 & &\text{(Model Sum of Squares)} \\[6pt] \text{SSE} &= \sum_{i=1}^n (Y_i - \hat{Y}_i)^2 & &\text{(Error Sum of Squares)} \end{align} \]
We can decompose the total variability into explained and unexplained variability:
\[ \text{SST} = \text{SSM} + \text{SSE}. \]
This decomposition is critical to the Analysis of Variance (ANOVA).
The proof is by the Pythagorean Theorem.

Recall: \(\hat{\bf Y} = P_{\mathrm{span}({\bf 1},\,{\bf x})}\,{\bf Y}\).
We can decompose the total variability into explained and unexplained variability:
\[ \text{SST} = \text{SSM} + \text{SSE}. \]
This decomposition is critical to the Analysis of Variance (ANOVA).
The proof is by the Pythagorean Theorem.

We can also project \({\bf Y}\) onto \({\bf 1}\):
\[ P_{\text{span}({\bf 1})} {\bf Y} = \bar Y {\bf 1}. \]
We can decompose the total variability into explained and unexplained variability:
\[ \text{SST} = \text{SSM} + \text{SSE}. \]
This decomposition is critical to the Analysis of Variance (ANOVA).
The proof is by the Pythagorean Theorem.

Letβs replace the vectors by points for simplicity.
We can decompose the total variability into explained and unexplained variability:
\[ \text{SST} = \text{SSM} + \text{SSE}. \]
This decomposition is critical to the Analysis of Variance (ANOVA).
The proof is by the Pythagorean Theorem.

Now we have another right triangle!
\[ {\small ||{\bf Y} - \bar{Y}\mathbf{1}||^2 = || \hat{\mathbf{Y}} - \bar{Y} \mathbf{1} ||^2 + || {\bf Y} - \hat{\mathbf{Y}} ||^2. } \]
\[ \begin{align} \underbrace{||{\bf Y} - \bar{Y}\mathbf{1}||^2}_{\text{SST}} &= \underbrace{|| \hat{\mathbf{Y}} - \bar{Y} \mathbf{1} ||^2}_{\text{SSM}} + \underbrace{|| {\bf Y} - \hat{\mathbf{Y}} ||^2}_{\text{SSE}}. \end{align} \]
The only thing left to prove is that \(\hat{\mathbf{Y}} - \bar{Y} \mathbf{1}\) is orthogonal to \({\bf Y} - \hat{\mathbf{Y}}\) so that the Pythagorean Theorem applies.
How do we measure how well a model fits?
The coefficient of determination is \[ \frac{\text{SSM}}{\text{SST}}, \] the fraction of total variability in \(Y\) explained by the model.
By the ANOVA decomposition, we can also write this as \[ 1 - \frac{\text{SSE}}{\text{SST}}. \]
Theorem. \(\displaystyle\frac{\text{SSM}}{\text{SST}} = r^2\), where \(r\) is the sample correlation between \(x\) and \(Y\).
Because it is equivalent to \(r^2\) in simple linear regression, the coefficient of determination is known as \[ R^2 \overset{\text{def}}{=} 1 - \frac{\text{SSE}}{\text{SST}}. \]
It can be used to measure the fit of any model that predicts \(Y\) (e.g., \(k\)-nearest neighbors, decision trees).
In the Analysis of Variance (ANOVA), we test the hypothesis \[ H_0: \beta = 0 \] by comparing variances: \(\text{SSM}\) vs. \(\text{SSE}\).
If their ratio \[ F = \frac{\text{SSM} / 1}{\text{SSE} / (n - 2)} = \frac{\text{MSM}}{\text{MSE}} \] is large, then we reject \(H_0\). This ratio is called the \(F\)-statistic.
Under \(H_0\), the \(F\)-statistic follows the \(F_{1, n-2}\) distribution.
The \(\text{SSM}\), \(\text{SSE}\), \(\text{MSM}\), \(\text{MSE}\), and \(F\)-statistic are typically organized in an ANOVA table:
We now have two ways of testing \(H_0: \beta = 0\):
How do the two tests compare?