-----------------------------------------------------------------------------------

log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\class4_2009.log

log type:  text

opened on:   5 Feb 2009, 11:24:23

. set mem 200m

Current memory allocation

current                                 memory usage

settable          value     description                 (1M = 1024k)

--------------------------------------------------------------------

set maxvar         5000     max. variables allowed           1.909M

set memory          200M    max. data space                200.000M

set matsize         400     max. RHS vars in models          1.254M

-----------

203.163M

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

. ttest yrsed if age>25 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    8229    13.32689    .0328611    2.980953    13.26248    13.39131

Female |    8643    13.56369    .0307005    2.854159    13.50351    13.62387

---------+--------------------------------------------------------------------

combined |   16872     13.4482    .0224725    2.919003    13.40415    13.49225

---------+--------------------------------------------------------------------

diff |           -.2368005    .0449229                -.324854   -.1487469

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.2713

Ho: diff = 0                                     degrees of freedom =    16870

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*If I want the statistics to match up exactly, I have to use the same age range.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819247        9,027

Female |    13.76295     2.720855        9,511

-------------------------------------------------

* See my Excel file for an explanation of the different cases.

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. regress yrsed female if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is   3.7786e+07)

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   25.52

Model |  195.741395     1  195.741395           Prob > F      =  0.0000

Residual |  142186.809 18536  7.67084641           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0013

Total |  142382.551 18537   7.6809921           Root MSE      =  2.7696

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2055446   .0406899     5.05   0.000     .1257887    .2853005

_cons |    13.5574   .0290221   467.14   0.000     13.50051    13.61429

------------------------------------------------------------------------------

. table sex if age>24 & age<35 [fweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

Male |     13.5574     2.819091     1.86e+07

Female |    13.76295     2.720712     1.92e+07

-------------------------------------------------

. regress yrsed female if age>24 & age<35 [fweight= perwt_rounded]

Source |       SS       df       MS              Number of obs =37785945

-------------+------------------------------           F(  1,37785943) =52018.00

Model |  398979.047     1  398979.047           Prob > F      =  0.0000

Residual |   28981891037785943  7.67001924           R-squared     =  0.0014

-------------+------------------------------           Adj R-squared =  0.0014

Total |   29021788937785944  7.68057796           Root MSE      =  2.7695

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2055446   .0009012   228.07   0.000     .2037782    .2073109

_cons |    13.5574   .0006428        .   0.000     13.55614    13.55866

------------------------------------------------------------------------------

. *Here if we used the fweights, we would be saying we really had 19 million men and 18.6 million women in the sample, which is not true, and we would get a totally misleading view of the significance of the difference (i.e. T value of 228.07)

. display invnormal(1-.025)

1.959964

. *The classic normal curve cutoff for a 5% 2-tail test, leaves 2.5% at one tail, and is achieved at a value of 1.96

. display invnormal(1-.005)

2.5758293

. *This is key value for the 1% 2 tail test, leaving half of 1% at each tail.

. display invnormal(1-.0005)

3.2905267

. *This is the key value for the .1% 2 tail test.

. display 2*(1-normal(5.716))

1.091e-08

. *What this tells us is that we could only get a value of 5.716 or higher, in either direction, once in 100 million tries or so. That is, very unlikely

. *In other words, we are completely sure in this sample that women's education is greater. The probability of getting this much difference by chance with this large a sample, if the there were in fact no real difference, is tiny.

. *Now a couple of quick demonstrations about how the T-distribution compares to the Normal

. display invttail(20,.025)

2.0859634

. *The critical .025 value of T-test is larger than the critical value of the Normal distribution when N=20, that is 2.086 is larger than 1.96

. display invttail(18000,.025)

1.9600958

. *With large N, as we have here in this case, the T-test and the Normal are basically identitical.

. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta",  replace

file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved

. exit, clear