-----------------------------------------------------------------------------------

name:  <unnamed>

> 1_logs\class5.log

log type:  text

opened on:   8 Oct 2013, 13:37:43

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |    13.31212     2.967666        9,027

Female |    13.55657     2.854472        9,511

-------------------------------------------------

* Our old friend, the young men and women’s table of average educational attainments.

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)

--------------------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

Male |    13.31212     2.967666     .0312351        9,027

Female |    13.55657     2.854472     .0292693        9,511

--------------------------------------------------------------

* With Standard error of the mean, which you recall is semean=sd/sqrt(n)

. ttest yrsed if age>24 & age<35, by (sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* Here are the equal(above) and unequal (below) versions of the ttest, very similar in outcome because the standard errors of the means of the educations of men and women are so similar…

. ttest yrsed if age>24 & age<35, by (sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*First check: change of scale of outcome variable.

. gen months_ed=yrsed*12

(30484 missing values generated)

. ttest months_ed if age>24 & age<35, by (sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* Under change of scale, mean and std error are changed (because they are in the units of Y, whatever Y is), but the t-statistic remains the same, because t statistic is unit-free.

. gen byte male=0

. replace male=1 if sex==1

* generate a new dummy variable for male, which we will use in our regressions.

. tabulate sex male

|         male

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |         0     64,791 |    64,791

Female |    68,919          0 |    68,919

-----------+----------------------+----------

Total |    68,919     64,791 |   133,710

. regress yrsed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

. regress months_ed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  39850.9104     1  39850.9104           Prob > F      =  0.0000

Residual |  22605108.7 18536  1219.52464           R-squared     =  0.0018

Total |  22644959.6 18537  1221.60865           Root MSE      =  34.922

------------------------------------------------------------------------------

months_ed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -2.933363   .5131471    -5.72   0.000    -3.939178   -1.927547

_cons |   162.6788   .3580818   454.31   0.000     161.9769    163.3807

------------------------------------------------------------------------------

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed semean yrsed freq)

--------------------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

Male |    13.31212     2.967666     .0312351        9,027

Female |    13.55657     2.854472     .0292693        9,511

--------------------------------------------------------------

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed semean yrsed freq)

--------------------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)   sem(yrsed)        Freq.

----------+---------------------------------------------------

Male |     13.5574     2.819247      .029673        9,027

Female |    13.76295     2.720855     .0278992        9,511

--------------------------------------------------------------

* aweighted data has similar (but not exactly the same) mean and sd, and weighted N exactly the same as unweighted N, because aweights (or “analytical weights”) rescale the weights so that the average weight is 1, in order to leave sample size unchanged.

. regress yrsed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

_cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

* aweighted regression is similar to unweighted regression, but not exactly the same (because the application of the weights makes some cases relatively more important, and some cases less important…

. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]

Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

Model |  398979.047     1  398979.047           Prob > F      =  0.0000

Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0009012  -228.07   0.000    -.2073109   -.2037782

_cons |   13.76294   .0006317  2.2e+04   0.000     13.76171    13.76418

------------------------------------------------------------------------------

* regression with fweights has the same coefficients as the aweighted regression, but has sample size increased by a factor of about 2000, meaning t-statistic increased by a factor of sqrt(2000), or about 43 times, to -228. The key thing to know about this is that the fweighted regression produces a wildly unrealistically large t-statistic, because here we are pretending that we really have 37 million young people in our sample, instead of the 18 thousand we really do have. Fweights are useful and correct for some applications (we use them with the CPS to generate national totals), but used in this way, the fweighted regression is wrong and misleading.

. gen random_uniform=uniform()

. summarize random_uniform

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

random_uni~m |    133710    .5006203    .2884588   .0000219   .9999971

. display 0.2884588^2

.08320848

* Just to recall, we proved earlier that the mean of the uniform distribution would be 0.5, and the variance would be 1/12, nice to see that both are still true.

* We cannot legitimately increase our sample size, but we can decrease the sample size arbitrarily.

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

_cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

. regress yrsed male if age>24 & age<35 & random_uniform <=0.25 [aweight= perwt_rounded]

(sum of wgt is   9.2846e+06)

Source |       SS       df       MS              Number of obs =    4578

-------------+------------------------------           F(  1,  4576) =    4.52

Model |  34.4468815     1  34.4468815           Prob > F      =  0.0336

Residual |  34890.4634  4576  7.62466419           R-squared     =  0.0010

Total |  34924.9102  4577  7.63052441           Root MSE      =  2.7613

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.1735029   .0816285    -2.13   0.034    -.3335342   -.0134715

_cons |   13.72653    .057329   239.43   0.000     13.61413    13.83892

------------------------------------------------------------------------------

* If we arbitrarily limit ourselves to ¼ of the data in the CPS, we expect the T-statistic to be half as large, but since this is a random sub-sample, it can be bigger or small than we expect it to be.

. regress yrsed male if age>24 & age<35 & random_uniform >=0.75 [aweight= perwt_rounded]

(sum of wgt is   9.6623e+06)

Source |       SS       df       MS              Number of obs =    4719

-------------+------------------------------           F(  1,  4717) =   11.48

Model |  87.7286053     1  87.7286053           Prob > F      =  0.0007

Residual |  36055.7534  4717  7.64378914           R-squared     =  0.0024

Total |   36143.482  4718  7.66076345           Root MSE      =  2.7647

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2727883   .0805211    -3.39   0.001    -.4306472   -.1149294

_cons |    13.8423   .0561835   246.38   0.000     13.73216    13.95245

------------------------------------------------------------------------------

* And here is a different random ¼ sample, note that the results are somewhat different than the previous, but would still lead to the same substantive answer (that young women in the US have significantly more education than young men).

. log close

name:  <unnamed>