. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.31212 2.967666 .0312351
Female | 9,511 13.55657 2.854472 .0292693
--------------------------------------------------------------
. display 2.967666/sqrt(9027)
.03123513
* Key point: standard error of the mean is sd/(sqrt(n))
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Note: there are two kinds of t-tests, equal variance and unequal variance. Stata assumes the equal variance version unless you tell it otherwise (as I do below)
. ttest yrsed if age>=25 & age<=34, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
*In the case of men and women’s educations, the variance and standard deviations of the two groups are so similar, that it makes hardly any difference whether you use the equal or the unequal variance t-test. But in HW2 in some cases it will matter. See my excel file and my PDF file for more information about the t-tests, how they are calculated, and how the standard error of the difference (the denominator of the t-statistic) is calculated, and how df are calculated.
. display ttail(18536,-5.7164)
.99999999
* Keep in mind that Stata’s ttail function gives you the right hand cumulative distribution, so if you start with a negative statistic, you get a value very close to 1.
. display (1-ttail(18356,-5.7164))
5.525e-09
*This gives us the tail probability
. display 2*(1-ttail(18356,-5.7164))
1.105e-08
* And this above gives us the probability of the 2 tails added together.
*Now let’s generate a proper dummy variable for gender, that we can use as an input in a regression.
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. codebook sex
----------------------------------------------------------------------------------------
sex Sex
----------------------------------------------------------------------------------------
type: numeric (byte)
label: sexlbl
range: [1,2] units: 1
unique values: 2 missing .: 0/133710
tabulation: Freq. Numeric Label
64791 1 Male
68919 2 Female
. gen byte female=0
female already defined
r(110);
. replace female=1 if sex==2
(0 real changes made)
. tabulate sex female
| female
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 64,791 0 | 64,791
Female | 0 68,919 | 68,919
-----------+----------------------+----------
Total | 64,791 68,919 | 133,710
* When you generate a new variable, it is always important to cross tabulate it with the old variable. Here our new female variable is 0-1, rather than 1-2.
. regress yrsed female if age>=25&age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2444469 .0427623 5.72 0.000 .1606289 .3282649
_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216
------------------------------------------------------------------------------
* The OLS regression gives us a t-test exactly equal to the equal variance t-test.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. display 2*(ttail(18356, 5.7164))
1.105e-08
. display 2*(1-normal(5.7164))
1.088e-08
* Because 18K degrees of freedom is a lot of degrees of freedom, the t and Normal distributions are almost exactly the same at the given statistic value (but notice the 1-Normal because Stata codes Normal for left hand cumulative distribution, while T is coded for right hand cumulative distribution (for an arbitrary reason I cannot guess). Stata help is your guide to syntax.
. display invnormal(1-.025)
1.959964
*1.96 is the key value of the Normal distribution. How many degrees of freedom do we need to have before the T distribution gets close to the Normal distribution in terms of yielding the same critical value associated with 2.5% single tail probability?
. display invttail(2, 0.025)
4.3026527
. display invttail(10, 0.025)
2.2281389
. display invttail(25, 0.025)
2.0595386
. display invttail(50, 0.025)
2.0085591
. display invttail(100, 0.025)
1.9839715
. display invttail(1000, 0.025)
1.9623391
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\cl
> ass4.log
log type: text
closed on: 1 Oct 2014, 12:32:36
----------------------------------------------------------------------------------------