A new document on what changes and what remains the same in regressions, when you change the inputs

Draft, Feb 19, 2010

Given a model

Y=Const +B1X1+B2X2+...BnXn + Residuals

 Type of Change Effect on Coefficients (Bs) Effect on T-statistic of that coefficient Effect on sample size of the model Effect on goodness of fit of the model 1) Change of units of one variable, X1 Changes units of B1 No change to the T-Statistic; T-statistics are unit-free None None 2) Inclusion of a new, formerly excluded category of variable X1 May or may not change B1. If B1 was a comparison between nurses and lawyers, and the new added group are sociologists, B1 won’t change, if there are no other predictor variables. If there are other predictor variables, all coefficients will be changed. The T-statistic will change, if for no other reason than the joint variance of the dependent variable Y is now different. Including new cases changes the N of the model Yes 3) Inclusion of a new predictor variable, Xm All the coefficients are jointly estimated, so every new variable changes all the other coefficients already in the model. This is one reason we do multiple regression, to estimate coefficient B1 net of the effect of variable Xm. Yes Usually no change. That is, the inclusion of a new predictor variable will only change the sample size of the model if the new predictor variable has missing values. Any cases with missing values on any predictor variable are dropped automatically Yes. For the R-square, any new nonzero terms must improve the fit. Adjusted R-square will get better if the new terms improve the fit, and will get worse if the new terms make no difference 4) Changing the excluded category of some variable already entered NO. The initial output reported by the software will be different, but all of the same comparisons as before can be recovered by combining the reported Bs, and when recovered they are the same No changes, when looking at the same comparisons No change No change 5) Weighting with analytic weights Unless the weights are uniform, the weights will change the coefficients Yes No change to sample size using analytic weights, because analytic weights are weights rescaled to leave the sample size unchanged Yes 6) Weighting with frequency weights Coefficients will behave the same as with analytic weights Dramatic changes here, because changed N will change the standard errors, and therefore also the T-statistics Dramatic changes Yes 7) Changing the sample size, N, of the dataset In theory, the expected value of B is not affected by changes in sample size. In practice, if you have a different sample (larger or smaller), B will be different because of sampling variation. You can easily take a random subset of any dataset and you will find that the Bs in the random subset are different from the overall Bs The expected values of T-statistics are proportional to the square root of N. If you quadruple the sample size, you would expect T-statistics to double, giving you greater power to reject null hypotheses. Of course, in a different sample, the actual T-statistics will not be changed by exactly square root of N, because sampling variation comes into play. Yes (duh). Yes, largely because of the sampling variation.