--------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\class8
> .log
log type: text
opened on: 15 Oct 2014, 10:49:13
. use "C:\Users\Michael\Documents\current class files\intro soc methods\cps_mar_2000_new with additional vars.dta", clear
. summarize incwage if lawyers==1, detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 441
25% 17000 0 Sum of Wgt. 441
50% 61000 Mean 74044.33
Largest Std. Dev. 69032.96
75% 100960 279376
90% 197387 279376 Variance 4.77e+09
95% 229339 279376 Skewness 1.132374
99% 257525 364302 Kurtosis 3.973892
. summarize incwage if nurses==1, detail
Wage and salary income
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 6500 0
10% 12000 0 Obs 966
25% 25000 0 Sum of Wgt. 966
50% 37000 Mean 37536.85
Largest Std. Dev. 21839.96
75% 48000 100000
90% 61000 132000 Variance 4.77e+08
95% 70000 229339 Skewness 3.506697
99% 89468 333564 Kurtosis 43.18005
* Note that there is more to the distributions than the 25th and 75th percentiles. Lawyers are more likely to have zero earning also.
*Back to our equal variance t-tests from HW2
. ttest incwage if lawyers==1 | sociologists ==1, by(occ1990)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Sociolog | 6 41508.33 2842.722 6963.219 34200.88 48815.78
Lawyers | 441 74044.33 3287.284 69032.96 67583.6 80505.06
---------+--------------------------------------------------------------------
combined | 447 73607.6 3248.139 68673.38 67224.04 79991.16
---------+--------------------------------------------------------------------
diff | -32535.99 28215.44 -87988.05 22916.07
------------------------------------------------------------------------------
diff = mean(Sociolog) - mean(Lawyers) t = -1.1531
Ho: diff = 0 degrees of freedom = 445
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.1247 Pr(|T| > |t|) = 0.2495 Pr(T > t) = 0.8753
. regress incwage lawyers if lawyers==1 | sociologists==1
Source | SS df MS Number of obs = 447
-------------+------------------------------ F( 1, 445) = 1.33
Model | 6.2663e+09 1 6.2663e+09 Prob > F = 0.2495
Residual | 2.0971e+12 445 4.7125e+09 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0007
Total | 2.1034e+12 446 4.7160e+09 Root MSE = 68648
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 32535.99 28215.44 1.15 0.249 -22916.07 87988.05
_cons | 41508.33 28025.43 1.48 0.139 -13570.31 96586.97
------------------------------------------------------------------------------
* The regression appears to give us the same t-test, but how can we be sure when the regression only reports the t-stat to 3 digits (1.15)? One way is to look at the coefficient and the SE that together comprise the t-test: they are exactly the same to 7 digits. Another way is to recover the t-statistic in full from the regression, which requires pulling out the coefficient and variance-covariance matrices.
*First, asking stata to store the coefficients and the variance covariance matrices in two local matrix variables that we name betas and VCM.
. matrix betas=e(b)
. matrix VCM=e(V)
. matrix list betas
betas[1,2]
lawyers _cons
y1 32535.993 41508.333
. matrix list VCM
symmetric VCM[2,2]
lawyers _cons
lawyers 7.961e+08
_cons -7.854e+08 7.854e+08
. display 32535.993/((7.961e+08)^.5)
1.1531353
* If we want more accuracy, we can rely not on the printed version of the VCM, but on the stored version (calling on Stata to use the [1,1] element of the VCM matrix:
display 32535.993/((VCM[1,1])^.5)
1.1531274
. regress incwage lawyers if lawyers==1 | sociologists==1
Source | SS df MS Number of obs = 447
-------------+------------------------------ F( 1, 445) = 1.33
Model | 6.2663e+09 1 6.2663e+09 Prob > F = 0.2495
Residual | 2.0971e+12 445 4.7125e+09 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0007
Total | 2.1034e+12 446 4.7160e+09 Root MSE = 68648
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 32535.99 28215.44 1.15 0.249 -22916.07 87988.05
_cons | 41508.33 28025.43 1.48 0.139 -13570.31 96586.97
------------------------------------------------------------------------------
* In this next case, we change the comparison category to sociologists and nurses, so the coefficient and its standard error will be different.
. regress incwage lawyers if lawyers==1 | sociologists==1 |nurses==1
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 1, 1411) = 222.77
Model | 4.0378e+11 1 4.0378e+11 Prob > F = 0.0000
Residual | 2.5575e+12 1411 1.8125e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1357
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42574
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 36482.96 2444.332 14.93 0.000 31688.04 41277.88
_cons | 37561.37 1365.553 27.51 0.000 34882.64 40240.1
------------------------------------------------------------------------------
* In this next case, we compare the income of lawyers to the incomes of everyone else (who has income), so again, the coefficient and its standard error changes.
. regress incwage lawyers
Source | SS df MS Number of obs = 103226
-------------+------------------------------ F( 1,103224) = 1610.72
Model | 1.3194e+12 1 1.3194e+12 Prob > F = 0.0000
Residual | 8.4558e+13103224 819166021 R-squared = 0.0154
-------------+------------------------------ Adj R-squared = 0.0154
Total | 8.5877e+13103225 831940347 Root MSE = 28621
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 54815.92 1365.829 40.13 0.000 52138.91 57492.92
_cons | 19228.41 89.2732 215.39 0.000 19053.43 19403.38
------------------------------------------------------------------------------
* Now back to our first example, the two-sample regression that gives results exactly like our two sample t-test:
. regress incwage lawyers if lawyers==1 | sociologists==1
Source | SS df MS Number of obs = 447
-------------+------------------------------ F( 1, 445) = 1.33
Model | 6.2663e+09 1 6.2663e+09 Prob > F = 0.2495
Residual | 2.0971e+12 445 4.7125e+09 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0007
Total | 2.1034e+12 446 4.7160e+09 Root MSE = 68648
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 32535.99 28215.44 1.15 0.249 -22916.07 87988.05
_cons | 41508.33 28025.43 1.48 0.139 -13570.31 96586.97
------------------------------------------------------------------------------
*What if we add a 3rd group to the model, yet maintain sociologists as the comparison case?
. regress incwage lawyers nurses if lawyers==1 | sociologists==1 |nurses==1
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 111.34
Model | 4.0387e+11 2 2.0194e+11 Prob > F = 0.0000
Residual | 2.5574e+12 1410 1.8137e+09 R-squared = 0.1364
-------------+------------------------------ Adj R-squared = 0.1352
Total | 2.9612e+12 1412 2.0972e+09 Root MSE = 42588
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lawyers | 32535.99 17504.37 1.86 0.063 -1801.409 66873.4
nurses | -3971.481 17440.4 -0.23 0.820 -38183.41 30240.45
_cons | 41508.33 17386.49 2.39 0.017 7402.162 75614.5
------------------------------------------------------------------------------
* We get the same coefficient as before (for the lawyer-sociologist comparison), but the standard error changes because the presence of nurses changes the overall variance of income, which is derived from everyone in the sample.
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\cla
> ss8.log
log type: text
closed on: 15 Oct 2014, 12:42:25
-----------------------------------------------------------------------------------------