-----------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\class8_2009.log
log type: text
opened on: 19 Feb 2009, 11:27:09
. set mem 200m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 200M max. data space 200.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
203.163M
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
. gen byte nurses=0
. replace nurses=1 if occ1990==95
(966 real changes made)
. gen byte sociologists=0
. replace sociologists=1 if occ1990==125
(6 real changes made)
. regress incwage lawyers if occ1990==178| occ1990==95
Source | SS df MS Number of obs = 1407
-------------+------------------------------ F( 1, 1405) = 221.72
Model | 4.0354e+11 1 4.0354e+11 Prob > F = 0.0000
Residual | 2.5571e+12 1405 1.8200e+09 R-squared = 0.1363
-------------+------------------------------ Adj R-squared = 0.1357
Total | 2.9607e+12 1406 2.1057e+09 Root MSE = 42662
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36507.47 2451.758 14.89 0.000 31697.97 41316.97
_cons | 37536.85 1372.618 27.35 0.000 34844.25 40229.45
------------------------------------------------------------------------------
. ttest incwage if occ1990==178 | occ1990==95, by(occ1990)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Register | 966 37536.85 702.6892 21839.96 36157.88 38915.83
Lawyers | 441 74044.33 3287.284 69032.96 67583.6 80505.06
---------+--------------------------------------------------------------------
combined | 1407 48979.49 1223.363 45888.34 46579.68 51379.31
---------+--------------------------------------------------------------------
diff | -36507.47 2451.758 -41316.97 -31697.97
------------------------------------------------------------------------------
diff = mean(Register) - mean(Lawyers) t = -14.8903
Ho: diff = 0 degrees of freedom = 1405
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
*Note that two sample ttest with equal variance assumption, and regression on two samples yields the same coefficient and T-statistic.
. regress incwage lawyers sociologists if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36507.47 2447.523 14.92 0.000 31706.3 41308.65
sociologists | 3971.481 17440.4 0.23 0.820 -30240.45 38183.41
_cons | 37536.85 1370.247 27.39 0.000 34848.91 40224.79
------------------------------------------------------------------------------
. regress incwage sociologists if occ1990==178| occ1990==125
Source | SS df MS Number of obs = 447
-------------+------------------------------ F( 1, 445) = 1.33
Model | 6.2663e+09 1 6.2663e+09 Prob > F = 0.2495
Residual | 2.0971e+12 445 4.7125e+09 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0007
Total | 2.1034e+12 446 4.7160e+09 Root MSE = 68648
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sociologists | -32535.99 28215.44 -1.15 0.249 -87988.05 22916.07
_cons | 74044.33 3268.953 22.65 0.000 67619.82 80468.83
------------------------------------------------------------------------------
. regress incwage nurses sociologists if occ1990==178| occ1990==125| occ1990==95
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nurses | -36507.47 2447.523 -14.92 0.000 -41308.65 -31706.3
sociologists | -32535.99 17504.37 -1.86 0.063 -66873.4 1801.409
_cons | 74044.33 2028.001 36.51 0.000 70066.1 78022.55
------------------------------------------------------------------------------
*And note that the T-statistic (but not the beta) for lawyer-sociologist comparison depends on the presence of the nurses in the model, because regression pools the variance of the subsamples...
. regress incwage lawyers sociologists if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36507.47 2447.523 14.92 0.000 31706.3 41308.65
sociologists | 3971.481 17440.4 0.23 0.820 -30240.45 38183.41
_cons | 37536.85 1370.247 27.39 0.000 34848.91 40224.79
------------------------------------------------------------------------------
. *What about a change of scale?
. gen incwage2=incwage*2
(30484 missing values generated)
. regress incwage2 lawyers sociologists if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 1.6155e+12 2 8.0774e+11 Prob > F = 0.0000
Residual | 1.0229e+13 1410 7.2550e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 1.1845e+13 1412 8.3888e+09 Root MSE = 85176
------------------------------------------------------------------------------
incwage2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 73014.95 4895.045 14.92 0.000 63412.59 82617.3
sociologists | 7942.963 34880.8 0.23 0.820 -60480.89 76366.82
_cons | 75073.7 2740.495 27.39 0.000 69697.82 80449.59
------------------------------------------------------------------------------
. *T-statistic is nicely unit free
. regress incwage lawyers sociologists if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36507.47 2447.523 14.92 0.000 31706.3 41308.65
sociologists | 3971.481 17440.4 0.23 0.820 -30240.45 38183.41
_cons | 37536.85 1370.247 27.39 0.000 34848.91 40224.79
------------------------------------------------------------------------------
. lincom lawyers-sociologists
( 1) lawyers - sociologists = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 32535.99 17504.37 1.86 0.063 -1801.409 66873.4
------------------------------------------------------------------------------
*After an arbitrary change of the excluded occupational category among these 3, we can still recover the same Betas and T-statistics with a simple lincom.
. regress incwage lawyers nurses if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 32535.99 17504.37 1.86 0.063 -1801.409 66873.4
nurses | -3971.481 17440.4 -0.23 0.820 -38183.41 30240.45
_cons | 41508.33 17386.49 2.39 0.017 7402.162 75614.5
------------------------------------------------------------------------------
. *lincom is a post-regression function. You can't run lincom without running the regression first. So lincom just gives you what the regression would give you if you made some different choices about excluded categories.
. regress incwage lawyers sociologists if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36507.47 2447.523 14.92 0.000 31706.3 41308.65
sociologists | 3971.481 17440.4 0.23 0.820 -30240.45 38183.41
_cons | 37536.85 1370.247 27.39 0.000 34848.91 40224.79
------------------------------------------------------------------------------
. regress incwage lawyers sociologists female if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 3, 1409) = 83.70
Model | 4.4789e+11 3 1.4930e+11 Prob > F = 0.0000
Residual | 2.5134e+12 1409 1.7838e+09 R-squared = 0.1512
-------------+------------------------------ Adj R-squared = 0.1494
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42235
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 25723.53 3256.452 7.90 0.000 19335.51 32111.55
sociologists | -604.9475 17320.32 -0.03 0.972 -34581.34 33371.45
female | -17003.19 3422.971 -4.97 0.000 -23717.86 -10288.53
_cons | 53448.74 3479.591 15.36 0.000 46623.01 60274.48
------------------------------------------------------------------------------
. *What we do by adding gender, is we calculate the other differences net of gender. So, since nursing is more female and the percentage of lawyers who are men is higher thant he percentage of nurses who are men, accounting for gender reduces the lawyer-nurse difference in the regression from $35K to $25K.
. gen byte male=0
. replace male=1 if sex==1
(64791 real changes made)
. regress incwage lawyers sociologists male if occ1990==178| occ1990==95 | occ1990==125
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 3, 1409) = 83.70
Model | 4.4789e+11 3 1.4930e+11 Prob > F = 0.0000
Residual | 2.5134e+12 1409 1.7838e+09 R-squared = 0.1512
-------------+------------------------------ Adj R-squared = 0.1494
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42235
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 25723.53 3256.452 7.90 0.000 19335.51 32111.55
sociologists | -604.9475 17320.32 -0.03 0.972 -34581.34 33371.45
male | 17003.19 3422.971 4.97 0.000 10288.53 23717.86
_cons | 36445.55 1376.531 26.48 0.000 33745.28 39145.82
------------------------------------------------------------------------------
. *here, the constant refers to the income of female nurses. In fact it is the mean PREDICTED value of income for female nurses... Also note that it makes no difference to the lawyer-nurse comparison whether the dummy for gender is male or female.
. predict Model_class8
(option xb assumed; fitted values)
. table occ1990 sex if occ1990==178|occ1990==95 | occ1990==125, contents(freq mean incwage mean Model_class8)
------------------------------------------------
Occupation, 1990 | Sex
basis | Male Female
----------------------+-------------------------
Registered nurses | 62 904
| 48602.45161 36777.9281
| 53448.74 36445.55
|
Sociology instructors | 2 4
| 39200 42662.5
| 52843.8 35840.6
|
Lawyers | 308 133
| 80236.42208 59704.73684
| 79172.27 62169.08
------------------------------------------------
. *A few things to note here: First, the 36445 is the constant in the model, it is also the mean predicted value for the category that is excluded by all the dummy variables in the model, namely female nurses. Second, take note of the fact that the predicted and actual mean income values are not the same here. In general, the predicted and actual values will not be the same. Third, note the fact that the $17K gender income gap is the same across all 3 occupational categories in the predicted values- this is a feature (or a problem) of regression cal
led linearity. The real data is not linear in this way (ie the real gender gap varies a lot by occupation).
. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", replace
file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved
. exit, clear