A new document on what changes and what remains the same in regressions, when you change the inputs


Draft, Feb 19, 2010


Given a model


Y=Const +B1X1+B2X2+...BnXn + Residuals



Type of Change

Effect on Coefficients (Bs)

Effect on T-statistic of that coefficient

Effect on sample size of the model

Effect on goodness of fit of the model

1) Change of units of one variable, X1

Changes units of B1

No change to the T-Statistic; T-statistics are unit-free



2) Inclusion of a new, formerly excluded category of variable X1

May or may not change B1. If B1 was a comparison between nurses and lawyers, and the new added group are sociologists, B1 won’t change.

The T-statistic will change, if for no other reason than the joint variance of the dependent variable Y is now different.

Including new cases changes the N of the model


3) Inclusion of a new predictor variable, Xm

All the coefficients are jointly estimated, so every new variable changes all the other coefficients already in the model. This is one reason we do multiple regression, to estimate coefficient B1 net of the effect of variable Xm.


Usually no change. That is, the inclusion of a new predictor variable will only change the sample size of the model if the new predictor variable has missing values. Any cases with missing values on any predictor variable are dropped automatically

Yes. For the R-square, any new nonzero terms must improve the fit. Adjusted R-square will get better if the new terms improve the fit, and will get worse if the new terms make no difference

4) Changing the excluded category of some variable already entered

NO. The initial output reported by the software will be different, but all of the same comparisons as before can be recovered by combining the reported Bs, and when recovered they are the same

No changes, when looking at the same comparisons

No change

No change

5) Weighting with analytic weights

Unless the weights are uniform, the weights will change the coefficients


No change to sample size using analytic weights, because analytic weights are weights rescaled to leave the sample size unchanged


6) Weighting with frequency weights

Coefficients will behave the same as with analytic weights

Dramatic changes here, because changed N will change the standard errors, and therefore also the T-statistics

Dramatic changes


 7) Changing the sample size, N, of the dataset

In theory, the expected value of B is not affected by changes in sample size. In practice, if you have a different sample (larger or smaller), B will be different because of sampling variation. You can easily take a random subset of any dataset and you will find that the Bs in the random subset are different from the overall Bs

 The expected values of T-statistics are proportional to the square root of N. If you quadruple the sample size, you would expect T-statistics to double, giving you greater power to reject null hypotheses. Of course, in a different sample, the actual T-statistics will not be changed by exactly square root of N, because sampling variation comes into play.

Yes (duh).

 Yes, largely because of the sampling variation.