--------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fal
> l_2013_381_logs\class4.log
log type: text
opened on: 3 Oct 2013, 13:37:20
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.31212 2.967666 .0312351
Female | 9,511 13.55657 2.854472 .0292693
--------------------------------------------------------------
. display 2.967666/(sqrt(9027))
.03123513
* Note that the standard error of the mean is just the standard deviation divided by the square root of N.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. ttest yrsed if age>=25 & age<=34, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* There are two t-tests, the equal and the unequal. You use the option “unequal” to get the unequal variance t-test, otherwise Stata gives you the equal variance t-test. And note: in this case, the test statistic and the degrees of freedom are almost identical, because the underlying variance of men’s and women’s educations, and the N of the two samples, is so similar.
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. display ttail(18536,-5.7164)
.99999999
* the stata function ttail(df, t) gives you the right hand tail probability, which in this case is the probability of all values larger than -5.7164. If you want the tail probability, you need 1-P, and if you want the two tail probability, you need 2*(1-P). But note that if we had done women compared to men, rather than men compared to women, we would have had a positive 5.7164 statistic, and we wouldn’t have had to do the “one minus P” part.
. display 2*(1-ttail(18356,-5.7164))
1.105e-08
* In order to generate a regression version of the above t-test, we need first to generate a 0-1 dummy variable for gender.
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. codebook sex
-----------------------------------------------------------------------------------
sex Sex
-----------------------------------------------------------------------------------
type: numeric (byte)
label: sexlbl
range: [1,2] units: 1
unique values: 2 missing .: 0/133710
tabulation: Freq. Numeric Label
64791 1 Male
68919 2 Female
. gen byte female=0
. replace female=1 if sex==2
(68919 real changes made)
. tabulate sex female
| female
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 64,791 0 | 64,791
Female | 0 68,919 | 68,919
-----------+----------------------+----------
Total | 64,791 68,919 | 133,710
. regress yrsed female if age>=25&age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2444469 .0427623 5.72 0.000 .1606289 .3282649
_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216
------------------------------------------------------------------------------
* The t-test for female here is identical to the equal variance t-test above (check the standard error of the estimate to be sure it matches exactly, since T= coeff/Std Error. The null hypothesis is that gender does not influence years of education, the T-statistic allows us to reject that null hypothesis because the probability associated with that test is very small, about 1 in 100 million. In other words, if men and women in the US had the same levels of education, the chance of getting a difference this large (0.244) just by chance in a sample this big is 1 in 100 million. Since that chance is small, we reject the null hypothesis.
* And note that in these regression results we have a second test, the test of the constant term (t=434). The null hypothesis of the second test is that the constant is zero. Since the constant here is men’s average education, that second hull hypothesis is a dopey one we are happy to reject.
. display 2*(ttail(18356, 5.7164))
1.105e-08
. display 2*(1-normal(5.7164))
1.088e-08
* The normal 2 tail probability associated with 5.7164 is a tiny bit smaller than the T- probability. T-distribution with 18000 df is very close to Normal, but not exactly the same.
. display invnormal(1-.025)
1.959964
* The key value of the normal distribution is 1.96, that is the value at which the tail distribution has P=0.25, meaning two tails yield P=5%. Anything that is less than 5% likely we deem (arbitrarily) to be too unlikely to have happened by chance.
* T-statistics that yield the same tail probability are always larger than the Normal statistic, but the difference only matters for very small N.
. display invttail(2, 0.025)
4.3026527
. display invttail(10, 0.025)
2.2281389
. display invttail(25, 0.025)
2.0595386
. display invttail(1000, 0.025)
1.9623391
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2013_38
> 1_logs\class4.log
log type: text
closed on: 3 Oct 2013, 15:56:06
-----------------------------------------------------------------------------------