-------------------------------------------------------------------------------------------

name:  <unnamed>

> lass5.log

log type:  text

opened on:   9 Oct 2012, 13:30:28

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |    13.31212     2.967666        9,027

Female |    13.55657     2.854472        9,511

* all the t-tests below and the regression coefficients and their t-statistics are based entirely on mean, SD, and N of 2 samples.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest yrsed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* for education, unequal and equal variance t-test are similar because the variances of the 2 subsamples are so similar to begin with.

. gen months_ed=yrsed*12

(30484 missing values generated)

. ttest months_ed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5136682                 -3.9402   -1.926525

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*note the effect of change of scale on mean and SD, but not on the T-statistic which is unit free.

. tabulate sex male

|         male

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |         0     64,791 |    64,791

Female |    68,919          0 |    68,919

-----------+----------------------+----------

Total |    68,919     64,791 |   133,710

* generate a dummy variable for gender.

. ttest months_ed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* regression is the same as the equal variance t-test.

. regress yrsed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |    13.31212     2.967666        9,027

Female |    13.55657     2.854472        9,511

*Now weight by analytic weights, yielding the same sample size, but slightly different mean and SD

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819247        9,027

Female |    13.76295     2.720855        9,511

-------------------------------------------------

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

_cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

* regression with aweights is similar, but not exactly the same as unweighted regression.

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819247        9,027

Female |    13.76295     2.720855        9,511

-------------------------------------------------

* perwt gives the same mean as aweight, but multiplies the N by about 2000

. table sex if age>24 & age<35 [fweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819091     1.86e+07

Female |    13.76295     2.720712     1.92e+07

-------------------------------------------------

. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]

Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

Model |  398979.047     1  398979.047           Prob > F      =  0.0000

Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0009012  -228.07   0.000    -.2073109   -.2037782

_cons |   13.76294   .0006317  2.2e+04   0.000     13.76171    13.76418

------------------------------------------------------------------------------

* fweighted regression yields a T statistic larger by sqrt(2000), or about 43 times larger, and totally unrealistic and unreasonable.

. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

_cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819247        9,027

Female |    13.76295     2.720855        9,511

-------------------------------------------------

. gen random_uniform_2=uniform()

* generate a uniform random variable.

. summarize  random_uniform_2

Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

random_uni~2 |    133710    .5006203    .2884588   .0000219   .9999971

* use that uniform random variable to reduce sample size to ¼ the prior size; note that means and SDs change a little bit, because of randomness.

. table sex if age>24 & age<35 &  random_uniform_2 <=.25 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |    13.55302     2.804218        2,248

Female |    13.72653     2.718835        2,330

-------------------------------------------------

. regress yrsed male if age>24 & age<35 &  random_uniform_2<=.25 [aweight= perwt_rounded]

(sum of wgt is   9.2846e+06)

Source |       SS       df       MS              Number of obs =    4578

-------------+------------------------------           F(  1,  4576) =    4.52

Model |  34.4468815     1  34.4468815           Prob > F      =  0.0336

Residual |  34890.4634  4576  7.62466419           R-squared     =  0.0010

Total |  34924.9102  4577  7.63052441           Root MSE      =  2.7613

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.1735029   .0816285    -2.13   0.034    -.3335342   -.0134715

_cons |   13.72653    .057329   239.43   0.000     13.61413    13.83892

------------------------------------------------------------------------------

* T-statistic roughly one half as large i.e., sqrt(1/4) times as large as before.

. regress yrsed male if age>24 & age<35  [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2055446   .0406899    -5.05   0.000    -.2853005   -.1257887

_cons |   13.76294   .0285199   482.57   0.000     13.70704    13.81885

------------------------------------------------------------------------------

. log close

name:  <unnamed>