4th class Stata log

-----------------------------------------------------------------------------------

log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\class4_2009.log

log type: text

opened on: 5 Feb 2009, 11:24:23

. set mem 200m

Current memory allocation

current memory usage

settable value description (1M = 1024k)

--------------------------------------------------------------------

set maxvar 5000 max. variables allowed 1.909M

set memory 200M max. data space 200.000M

set matsize 400 max. RHS vars in models 1.254M

-----------

203.163M

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

. ttest yrsed if age>25 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 8229 13.32689 .0328611 2.980953 13.26248 13.39131

Female | 8643 13.56369 .0307005 2.854159 13.50351 13.62387

---------+--------------------------------------------------------------------

combined | 16872 13.4482 .0224725 2.919003 13.40415 13.49225

---------+--------------------------------------------------------------------

diff | -.2368005 .0449229 -.324854 -.1487469

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.2713

Ho: diff = 0 degrees of freedom = 16870

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

*If I want the statistics to match up exactly, I have to use the same age range.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed) sd(yrsed) Freq.

----------+--------------------------------------

Male | 13.5574 2.819247 9,027

Female | 13.76295 2.720855 9,511

-------------------------------------------------

* See my Excel file for an explanation of the different cases.

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. regress yrsed female if age>24 & age<35 [aweight= perwt_rounded]

(sum of wgt is 3.7786e+07)

Source | SS df MS Number of obs = 18538

-------------+------------------------------ F( 1, 18536) = 25.52

Model | 195.741395 1 195.741395 Prob > F = 0.0000

Residual | 142186.809 18536 7.67084641 R-squared = 0.0014

-------------+------------------------------ Adj R-squared = 0.0013

Total | 142382.551 18537 7.6809921 Root MSE = 2.7696

------------------------------------------------------------------------------

yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

female | .2055446 .0406899 5.05 0.000 .1257887 .2853005

_cons | 13.5574 .0290221 467.14 0.000 13.50051 13.61429

------------------------------------------------------------------------------

. table sex if age>24 & age<35 [fweight= perwt_rounded], contents (mean yrsed sd yrsed freq)

-------------------------------------------------

Sex | mean(yrsed) sd(yrsed) Freq.

----------+--------------------------------------

Male | 13.5574 2.819091 1.86e+07

Female | 13.76295 2.720712 1.92e+07

-------------------------------------------------

. regress yrsed female if age>24 & age<35 [fweight= perwt_rounded]

Source | SS df MS Number of obs =37785945

-------------+------------------------------ F( 1,37785943) =52018.00

Model | 398979.047 1 398979.047 Prob > F = 0.0000

Residual | 28981891037785943 7.67001924 R-squared = 0.0014

-------------+------------------------------ Adj R-squared = 0.0014

Total | 29021788937785944 7.68057796 Root MSE = 2.7695

------------------------------------------------------------------------------

yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

female | .2055446 .0009012 228.07 0.000 .2037782 .2073109

_cons | 13.5574 .0006428 . 0.000 13.55614 13.55866

------------------------------------------------------------------------------

. *Here if we used the fweights, we would be saying we really had 19 million men and 18.6 million women in the sample, which is not true, and we would get a totally misleading view of the significance of the difference (i.e. T value of 228.07)

. display invnormal(1-.025)

1.959964

. *The classic normal curve cutoff for a 5% 2-tail test, leaves 2.5% at one tail, and is achieved at a value of 1.96

. display invnormal(1-.005)

2.5758293

. *This is key value for the 1% 2 tail test, leaving half of 1% at each tail.

. display invnormal(1-.0005)

3.2905267

. *This is the key value for the .1% 2 tail test.

. display 2*(1-normal(5.716))

1.091e-08

. *What this tells us is that we could only get a value of 5.716 or higher, in either direction, once in 100 million tries or so. That is, very unlikely

. *In other words, we are completely sure in this sample that women's education is greater. The probability of getting this much difference by chance with this large a sample, if the there were in fact no real difference, is tiny.

. *Now a couple of quick demonstrations about how the T-distribution compares to the Normal

. display invttail(20,.025)

2.0859634

. *The critical .025 value of T-test is larger than the critical value of the Normal distribution when N=20, that is 2.086 is larger than 1.96

. display invttail(18000,.025)

1.9600958

. *With large N, as we have here in this case, the T-test and the Normal are basically identitical.

. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", replace

file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved

. exit, clear