2024-05-06
Here is a data set of house sales in Ames, IA.
Let’s use this data to build models to predict house prices!
How well does square footage predict house price?
How well does number of bedrooms predict house price?
How well does number of bathrooms predict house price?
Let’s build a model that incorporates all of the predictors!
Why is the coefficient of Bed negative?
What can we learn from inferential statistics?
Inference for linear regression depends on the following assumptions:
Assumption 2 is actually three assumptions in one:
How do we check these assumptions?
To check assumptions about the errors, we can examine the residuals.
Example: To test the normality of the errors, make a normal Q-Q plot of the residuals.
To check assumptions about the errors, we can examine the residuals.
Example: To check correctness of the linear model and constant variance, plot the residuals against \(\hat Y\).
If you call plot on a model fit using lm, then R will make many of these residual plots for you.
What have we learned from the residual analysis?
To fix these problems, we could consider transformations of \(Y\):
but we will not pursue that here.
A partial residual plot helps visualize the functional form of one predictor variable in a multiple regression model.
For example, the partial residuals of predictor \(X_k\) are defined by \[ \begin{aligned} \text{(partial residual for $X_k$)}_i &= Y_i - \hat\beta_0 - \sum_{j\neq k} \hat\beta_j X_{ij} = \text{residual}_i + \hat\beta_k X_{ik} \end{aligned} \]
A partial residual plot plots these partial residuals against \(X_k\).
What if we fit a (simple) linear regression model to the partial residuals?
Where have we seen \(-30054\) before?
Bed in the multiple regression model!A partial regression plot (also called an added variable plot) shows \(Y\) and \(X_k\), adjusting for the effect of the other predictors.
A partial regression plot is harder to interpret than a partial residual plot (because both the \(x\) and \(y\) axes display residuals).
However, if we fit a (simple) linear regression model to the partial regression plot, the slope and inferences match the coefficient and inferences for Bed in the multiple regression exactly.