-----------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\class4_2009.log
log type: text
opened on: 5 Feb 2009, 11:24:23
. set mem 200m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 200M max. data space 200.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
203.163M
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
. ttest yrsed if age>25 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 8229 13.32689 .0328611 2.980953 13.26248 13.39131
Female | 8643 13.56369 .0307005 2.854159 13.50351 13.62387
---------+--------------------------------------------------------------------
combined | 16872 13.4482 .0224725 2.919003 13.40415 13.49225
---------+--------------------------------------------------------------------
diff | -.2368005 .0449229 -.324854 -.1487469
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.2713
Ho: diff = 0 degrees of freedom = 16870
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
*If I want the statistics to match up exactly, I have to use the same age range.
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819247 9,027
Female | 13.76295 2.720855 9,511
-------------------------------------------------
* See my Excel file for an explanation of the different cases.
. gen byte female=0
. replace female=1 if sex==2
(68919 real changes made)
. regress yrsed female if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2055446 .0406899 5.05 0.000 .1257887 .2853005
_cons | 13.5574 .0290221 467.14 0.000 13.50051 13.61429
------------------------------------------------------------------------------
. table sex if age>24 & age<35 [fweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819091 1.86e+07
Female | 13.76295 2.720712 1.92e+07
-------------------------------------------------
. regress yrsed female if age>24 & age<35 [fweight= perwt_rounded]
Source | SS df MS Number of obs =37785945
-------------+------------------------------ F( 1,37785943) =52018.00
Model | 398979.047 1 398979.047 Prob > F = 0.0000
Residual | 28981891037785943 7.67001924 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0014
Total | 29021788937785944 7.68057796 Root MSE = 2.7695
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2055446 .0009012 228.07 0.000 .2037782 .2073109
_cons | 13.5574 .0006428 . 0.000 13.55614 13.55866
------------------------------------------------------------------------------
. *Here if we used the fweights, we would be saying we really had 19 million men and 18.6 million women in the sample, which is not true, and we would get a totally misleading view of the significance of the difference (i.e. T value of 228.07)
. display invnormal(1-.025)
1.959964
. *The classic normal curve cutoff for a 5% 2-tail test, leaves 2.5% at one tail, and is achieved at a value of 1.96
. display invnormal(1-.005)
2.5758293
. *This is key value for the 1% 2 tail test, leaving half of 1% at each tail.
. display invnormal(1-.0005)
3.2905267
. *This is the key value for the .1% 2 tail test.
. display 2*(1-normal(5.716))
1.091e-08
. *What this tells us is that we could only get a value of 5.716 or higher, in either direction, once in 100 million tries or so. That is, very unlikely
. *In other words, we are completely sure in this sample that women's education is greater. The probability of getting this much difference by chance with this large a sample, if the there were in fact no real difference, is tiny.
. *Now a couple of quick demonstrations about how the T-distribution compares to the Normal
. display invttail(20,.025)
2.0859634
. *The critical .025 value of T-test is larger than the critical value of the Normal distribution when N=20, that is 2.086 is larger than 1.96
. display invttail(18000,.025)
1.9600958
. *With large N, as we have here in this case, the T-test and the Normal are basically identitical.
. save "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", replace
file C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta saved
. exit, clear