--------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\class8

> .log

log type:  text

opened on:  15 Oct 2014, 10:49:13

. use "C:\Users\Michael\Documents\current class files\intro soc methods\cps_mar_2000_new with additional vars.dta", clear

. summarize incwage if lawyers==1, detail

Wage and salary income

-------------------------------------------------------------

Percentiles      Smallest

1%            0              0

5%            0              0

10%            0              0       Obs                 441

25%        17000              0       Sum of Wgt.         441

50%        61000                      Mean           74044.33

Largest       Std. Dev.      69032.96

75%       100960         279376

90%       197387         279376       Variance       4.77e+09

95%       229339         279376       Skewness       1.132374

99%       257525         364302       Kurtosis       3.973892

. summarize incwage if nurses==1, detail

Wage and salary income

-------------------------------------------------------------

Percentiles      Smallest

1%            0              0

5%         6500              0

10%        12000              0       Obs                 966

25%        25000              0       Sum of Wgt.         966

50%        37000                      Mean           37536.85

Largest       Std. Dev.      21839.96

75%        48000         100000

90%        61000         132000       Variance       4.77e+08

95%        70000         229339       Skewness       3.506697

99%        89468         333564       Kurtosis       43.18005

* Note that there is more to the distributions than the 25th and 75th percentiles. Lawyers are more likely to have zero earning also.

*Back to our equal variance t-tests from HW2

. ttest incwage if lawyers==1 | sociologists ==1, by(occ1990)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Sociolog |       6    41508.33    2842.722    6963.219    34200.88    48815.78

Lawyers |     441    74044.33    3287.284    69032.96     67583.6    80505.06

---------+--------------------------------------------------------------------

combined |     447     73607.6    3248.139    68673.38    67224.04    79991.16

---------+--------------------------------------------------------------------

diff |           -32535.99    28215.44               -87988.05    22916.07

------------------------------------------------------------------------------

diff = mean(Sociolog) - mean(Lawyers)                         t =  -1.1531

Ho: diff = 0                                     degrees of freedom =      445

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.1247         Pr(|T| > |t|) = 0.2495          Pr(T > t) = 0.8753

. regress incwage lawyers if lawyers==1 | sociologists==1

Source |       SS       df       MS              Number of obs =     447

-------------+------------------------------           F(  1,   445) =    1.33

Model |  6.2663e+09     1  6.2663e+09           Prob > F      =  0.2495

Residual |  2.0971e+12   445  4.7125e+09           R-squared     =  0.0030

-------------+------------------------------           Adj R-squared =  0.0007

Total |  2.1034e+12   446  4.7160e+09           Root MSE      =   68648

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   32535.99   28215.44     1.15   0.249    -22916.07    87988.05

_cons |   41508.33   28025.43     1.48   0.139    -13570.31    96586.97

------------------------------------------------------------------------------

* The regression appears to give us the same t-test, but how can we be sure when the regression only reports the t-stat to 3 digits (1.15)? One way is to look at the coefficient and the SE that together comprise the t-test: they are exactly the same to 7 digits. Another way is to recover the t-statistic in full from the regression, which requires pulling out the coefficient and variance-covariance matrices.

*First, asking stata to store the coefficients and the variance covariance matrices in two local matrix variables that we name betas and VCM.

. matrix betas=e(b)

. matrix VCM=e(V)

. matrix list betas

betas[1,2]

lawyers      _cons

y1  32535.993  41508.333

. matrix list VCM

symmetric VCM[2,2]

lawyers       _cons

lawyers   7.961e+08

_cons  -7.854e+08   7.854e+08

. display 32535.993/((7.961e+08)^.5)

1.1531353

* If we want more accuracy, we can rely not on the printed version of the VCM, but on the stored version (calling on Stata to use the [1,1] element of the VCM matrix:

display 32535.993/((VCM[1,1])^.5)

1.1531274

. regress incwage lawyers if lawyers==1 | sociologists==1

Source |       SS       df       MS              Number of obs =     447

-------------+------------------------------           F(  1,   445) =    1.33

Model |  6.2663e+09     1  6.2663e+09           Prob > F      =  0.2495

Residual |  2.0971e+12   445  4.7125e+09           R-squared     =  0.0030

-------------+------------------------------           Adj R-squared =  0.0007

Total |  2.1034e+12   446  4.7160e+09           Root MSE      =   68648

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   32535.99   28215.44     1.15   0.249    -22916.07    87988.05

_cons |   41508.33   28025.43     1.48   0.139    -13570.31    96586.97

------------------------------------------------------------------------------

* In this next case, we change the comparison category to sociologists and nurses, so the coefficient and its standard error will be different.

. regress incwage lawyers if lawyers==1 | sociologists==1 |nurses==1

Source |       SS       df       MS              Number of obs =    1413

-------------+------------------------------           F(  1,  1411) =  222.77

Model |  4.0378e+11     1  4.0378e+11           Prob > F      =  0.0000

Residual |  2.5575e+12  1411  1.8125e+09           R-squared     =  0.1364

-------------+------------------------------           Adj R-squared =  0.1357

Total |  2.9612e+12  1412  2.0972e+09           Root MSE      =   42574

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   36482.96   2444.332    14.93   0.000     31688.04    41277.88

_cons |   37561.37   1365.553    27.51   0.000     34882.64     40240.1

------------------------------------------------------------------------------

* In this next case, we compare the income of lawyers to the incomes of everyone else (who has income), so again, the coefficient and its standard error changes.

. regress incwage lawyers

Source |       SS       df       MS              Number of obs =  103226

-------------+------------------------------           F(  1,103224) = 1610.72

Model |  1.3194e+12     1  1.3194e+12           Prob > F      =  0.0000

Residual |  8.4558e+13103224   819166021           R-squared     =  0.0154

-------------+------------------------------           Adj R-squared =  0.0154

Total |  8.5877e+13103225   831940347           Root MSE      =   28621

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   54815.92   1365.829    40.13   0.000     52138.91    57492.92

_cons |   19228.41    89.2732   215.39   0.000     19053.43    19403.38

------------------------------------------------------------------------------

* Now back to our first example, the two-sample regression that gives results exactly like our two sample t-test:

. regress incwage lawyers if lawyers==1 | sociologists==1

Source |       SS       df       MS              Number of obs =     447

-------------+------------------------------           F(  1,   445) =    1.33

Model |  6.2663e+09     1  6.2663e+09           Prob > F      =  0.2495

Residual |  2.0971e+12   445  4.7125e+09           R-squared     =  0.0030

-------------+------------------------------           Adj R-squared =  0.0007

Total |  2.1034e+12   446  4.7160e+09           Root MSE      =   68648

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   32535.99   28215.44     1.15   0.249    -22916.07    87988.05

_cons |   41508.33   28025.43     1.48   0.139    -13570.31    96586.97

------------------------------------------------------------------------------

*What if we add a 3rd group to the model, yet maintain sociologists as the comparison case?

. regress incwage lawyers nurses if lawyers==1 | sociologists==1 |nurses==1

Source |       SS       df       MS              Number of obs =    1413

-------------+------------------------------           F(  2,  1410) =  111.34

Model |  4.0387e+11     2  2.0194e+11           Prob > F      =  0.0000

Residual |  2.5574e+12  1410  1.8137e+09           R-squared     =  0.1364

-------------+------------------------------           Adj R-squared =  0.1352

Total |  2.9612e+12  1412  2.0972e+09           Root MSE      =   42588

------------------------------------------------------------------------------

incwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

lawyers |   32535.99   17504.37     1.86   0.063    -1801.409     66873.4

nurses |  -3971.481    17440.4    -0.23   0.820    -38183.41    30240.45

_cons |   41508.33   17386.49     2.39   0.017     7402.162     75614.5

------------------------------------------------------------------------------

* We get the same coefficient as before (for the lawyer-sociologist comparison), but the standard error changes because the presence of nurses changes the overall variance of income, which is derived from everyone in the sample.

. log close

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\cla

> ss8.log

log type:  text

closed on:  15 Oct 2014, 12:42:25

-----------------------------------------------------------------------------------------