-----------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\
> 2011_180B_logs\class5.log
log type: text
opened on: 8 Feb 2011, 14:00:48
* Mostly in class 5 we talked about the class Excel file, about regression and best fit lines. Here below are the 3 models that correspond to the worksheet in my Excel file which I have named "regression graphs and fits"
. regress incwage age if age>24 & age<65 [aweight=perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 1, 69303) = 0.37
Model | 384750491 1 384750491 Prob > F = 0.5454
Residual | 7.2919e+13 69303 1.0522e+09 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = -0.0000
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 32437
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -7.044124 11.64878 -0.60 0.545 -29.87572 15.78747
_cons | 27764.45 511.3078 54.30 0.000 26762.29 28766.61
------------------------------------------------------------------------------
* age by itself is not a good predictor of incwage. The T statistic is smaller (in absolute value) than 1, because the standard error of this coefficient is larger than the coefficient itself. We cannot discard the null hypothesis that there is no linear relationship between age and incwage. The R-square is zero, because this model explains none of the variance in incwage.
* Notice also that in this model and in the following models, the constant term is not particularly meaningful or helpful, because we don't really care what the predicted income is of people who are zero years old- it is not a meaningful number.
. gen age_sq=age^2
* one thing we can notice when we graphed age versus income is that the relationship is like an upside down U, or a parabola. We need a second order age term to fit a parabolic shape.
. regress incwage age age_sq if age>24 & age<65 [aweight=perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 2, 69302) = 589.92
Model | 1.2206e+12 2 6.1032e+11 Prob > F = 0.0000
Residual | 7.1698e+13 69302 1.0346e+09 R-squared = 0.0167
-------------+------------------------------ Adj R-squared = 0.0167
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 32165
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 3252.025 95.59684 34.02 0.000 3064.655 3439.395
age_sq | -37.33998 1.087252 -34.34 0.000 -39.47099 -35.20897
_cons | -39131.12 2012.747 -19.44 0.000 -43076.11 -35186.14
------------------------------------------------------------------------------
* Once age-squared is included as a predictor, both age and age squared are highly significant. R-square is still only 1.67%, but that is better than zero for sure.
. regress incwage age age_sq yrsed if age>24 & age<65 [aweight=perwt_rounded]
(sum of wgt is 1.4261e+08)
Source | SS df MS Number of obs = 69305
-------------+------------------------------ F( 3, 69301) = 3083.86
Model | 8.5881e+12 3 2.8627e+12 Prob > F = 0.0000
Residual | 6.4331e+13 69301 928282733 R-squared = 0.1178
-------------+------------------------------ Adj R-squared = 0.1177
Total | 7.2919e+13 69304 1.0522e+09 Root MSE = 30468
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 2891.918 90.64298 31.90 0.000 2714.258 3069.578
age_sq | -32.46307 1.031339 -31.48 0.000 -34.4845 -30.44165
yrsed | 3561.537 39.97782 89.09 0.000 3483.181 3639.894
_cons | -81235.26 1964.252 -41.36 0.000 -85085.19 -77385.33
------------------------------------------------------------------------------
* The way we would interpret this yrsed coefficient is that each additional year of education adds $3561 to a person's annual income for 1999, net of the effects of age. That is, each predictor is calculated net of the other predictors.
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\s
> oc_meth_proj3\2011_180B_logs\class5.log
log type: text
closed on: 8 Feb 2011, 15:36:35
---------------------------------------------------------------------------------------