-----------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web paes\soc_meth_proj3\fall_2010_s381_logs\class4.log
log type: text
opened on: 30 Sep 2010, 14:40:52
. table sex if age>24 & age<35, contents (freq mean yrsed sd yrsed)
-------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed)
----------+--------------------------------------
Male | 9,027 13.31212 2.967666
Female | 9,511 13.55657 2.854472
-------------------------------------------------
. table sex if age>24 & age<35, contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.31212 2.967666 9,027
Female | 13.55657 2.854472 9,511
-------------------------------------------------
* Take a look at my Web posted Excel File, under ttests, to get a full recount of the hand calculations of these numbers and statistics.
. ttest yrsed if age>24 & age<35, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* A couple points to note. First, the default for ttest and for regression(below) is the equal variance test. If you want the unequal variance ttest, you have to ask for it by name. The formulas I talked about in class last class were for the unequal variance ttest which I think of as being a little more intuitive. Second, the degrees of freedom of the two tests are different, but not in a way that matters because the t distribution with 18383 df is so much like the t distribution with 18536 df that the results are not affected at all. Lastly, in this case, the variance and SD of the two samples (men and women's education) are so close that it does not matter much which assumption we make about the combined variance of the two groups. Note that the t-statistics above differ only slightly. If the variances of the two samples were really different, then the results would be really different between the equal and the unequal variance tests…
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
*compare to the equal variance test above..
. gen monthsed=yrsed*12
(30484 missing values generated)
. ttest monthsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 159.7454 .3748215 35.61199 159.0107 160.4802
Female | 9511 162.6788 .3512319 34.25366 161.9903 163.3673
---------+--------------------------------------------------------------------
combined | 18538 161.2504 .2567052 34.95152 160.7472 161.7536
---------+--------------------------------------------------------------------
diff | -2.933363 .5131471 -3.939178 -1.927547
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000\
*Note: t-test is unit free, it doesn't matter whether we measure education in years or in microseconds, the resulting t statistic is the same..
. ttest yrsed if age>24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. display ttail(18000, 5.7164)
5.526e-09
*the syntax is ttail(df, t-statistic), and the output give you remaining right hand tail probability
. display 2*ttail(18000, 5.7164)
1.105e-08
* Usually we think about two-tailed tests, which means doubling the tail probability, because the distribution has two equal tails..
. display normal (5.716)
normal not found
r(111);
* I guess stata didn't like the space..
. display normal(5.716)
.99999999
* normal (z-score) gives you the cumulative normal distribution up to that point. If we want the distribution in the two tails, we subtract from one and double it..
. display 2*(1-normal(5.716))
1.091e-08
. display invnormal(1-.025)
1.959964
* invnormal takes a tail probability and gives you the corresponding Z-score statistic. 1.96 is the key cutoff because if one tail has .025 left, that means the two tailed test would have .05 probability left in the tails.
. display invttail(5,.025)
2.5705818
* The t-statistic for the same upper tail probability is higher, but how much higher depends on the degrees of freedom. When df is small (like 5) the difference is more.
. display invttail(10000,.025)
1.9602012
* At df of 10,000, the t distribution becomes indistinguishable from the Normal distribution.
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
* First regress unweighted.
. regress yrsed male if age>24 & age<35 [aweight= perwt_rounded]
(sum of wgt is 3.7786e+07)
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 25.52
Model | 195.741395 1 195.741395 Prob > F = 0.0000
Residual | 142186.809 18536 7.67084641 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0013
Total | 142382.551 18537 7.6809921 Root MSE = 2.7696
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0406899 -5.05 0.000 -.2853005 -.1257887
_cons | 13.76294 .0285199 482.57 0.000 13.70704 13.81885
------------------------------------------------------------------------------
* Second, regress with aweights, which preserves our sample size (note the number of observations compared to the unweighted example) by rescaling the weights to an average of 1. The coefficient and t statistic are a little different, because the weighted data are a little different than the unweighted data.. Note that ttest does not accept weights.
. table sex if age>24 & age<35 [aweight= perwt_rounded], contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.5574 2.819247 9,027
Female | 13.76295 2.720855 9,511
-------------------------------------------------
. regress yrsed male if age>24 & age<35 [fweight= perwt_rounded]
Source | SS df MS Number of obs =37785945
-------------+------------------------------ F( 1,37785943) =52018.00
Model | 398979.047 1 398979.047 Prob > F = 0.0000
Residual | 28981891037785943 7.67001924 R-squared = 0.0014
-------------+------------------------------ Adj R-squared = 0.0014
Total | 29021788937785944 7.68057796 Root MSE = 2.7695
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2055446 .0009012 -228.07 0.000 -.2073109 -.2037782
_cons | 13.76294 .0006317 2.2e+04 0.000 13.76171 13.76418
------------------------------------------------------------------------------
* If we use fweights instead, the number of observations is increased by a factor of about 2,000, and the t-statistic is increased by a factor of factor of the square root of 2,000, or 45 or so. And this is wrong, wrong, wrong… Because we don't have 36 million cases, we have 18 thousand cases…
. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace
file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pag
> es\soc_meth_proj3\fall_2010_s381_logs\class4.log
log type: text
closed on: 30 Sep 2010, 15:50:00
-----------------------------------------------------------------------------------