--------------------------------------------------------------------------------------------------

name:  <unnamed>

> og.log

log type:  text

opened on:  18 Apr 2019, 14:40:54

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta", clear

. *class starts here

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* above is our classic t-test. Below is the regression version of the same test. The “i.sex” part is just to tell stata that sex is a categorical variable, so not to take the actual values seriously, but to create a 0-1 dummy variable for sex, the way we do below by hand.

. regress yrsed i.sex if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex |

Female  |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

. codebook sex

--------------------------------------------------------------------------------------------------

sex                                                                                            Sex

--------------------------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

* creating the dummy variable by hand.

. gen byte male=0 if sex==2

(64791 missing values generated)

. replace male=1 if sex==1

. tabulate sex male

|         male

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |         0     64,791 |    64,791

Female |    68,919          0 |    68,919

-----------+----------------------+----------

Total |    68,919     64,791 |   133,710

. regress yrsed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

------------------------------------------------------------------------------

* We did not quite get to it in class, but it is easy to add weights to a regression. Aweights gives you a reasonable answer.

. regress yrsed i.sex if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex |

Female  |   .2055446   .0406899     5.05   0.000     .1257887    .2853005

_cons |    13.5574   .0290221   467.14   0.000     13.50051    13.61429

------------------------------------------------------------------------------

* Fweights gives you a very unreasonable answer (in the SE an T stats), because it unreasonably inflates the sample size by a factor of 2000.

. regress yrsed i.sex if age>24 & age<35 [fweight= perwt_rounded]

Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

Model |  398979.047     1  398979.047           Prob > F      =  0.0000

Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex |

Female  |   .2055446   .0009012   228.07   0.000     .2037782    .2073109

_cons |    13.5574   .0006428  2.1e+04   0.000     13.55614    13.55866

------------------------------------------------------------------------------

. log close

name:  <unnamed>