--------------------------------------------------------------------------------------------------

name:  <unnamed>

log type:  text

opened on:  11 Apr 2019, 14:34:47

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta", clear

. *class starts here

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*This is our friendly t-test comparing the educational attainment of men and women 25-34. Among the things I want you to recognize, is how t-test is related to the difference, and the standard error of that difference:

. display -.2444469/0.0427623

-5.7164114

* For individual distributions, the standard error= SD/(sqrt(n))

. *standard error= SD/sqrt(n)

. display 2.967666/(9027^0.5)

.03123513

* If we don’t specify, Stata gives us the equal variance ttest. If we specify “unequal” after the comma, we get the unequal variance ttest, which in this case is only very slightly different. The difference here below (-5.7106 compared -5.7164 above in the equal variance ttest) is very similar because the actual SDs of men and women’s education are nearly the same, as you can see from the table:

. ttest yrsed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

*The middle of the 3 P values above, Pr(|T|> |t|) is the one we care about, that is the probability that T statistic with expected value of zero would end up, just by chance, with a value as far from zero as -5.7. In the above output, Stata lists that probability as “0.0000” but we want to know more, because the actual probability has to be larger than zero. So how large is it?

. display ttail(18000, 5.7164)

5.526e-09

. display ttail(18536, 5.7164)

5.524e-09

* ttail(df, t) is the syntax of the ttail command, and it produces a probability of getting a t-statistic that large or larger, in a one tail test. Once you get into large degrees of freedom, here 18000, the exact number doesn’t really matter, but we might as well report it correctly: 18536 df,

* And we want to multiply the P value by 2, because the t-distribution has 2 tails. We end up with a value P=1.105x10-8

. display 2*ttail(18536, 5.7164)

1.105e-08

* also note: the sign of the t-statistic is arbitrary here, because the order of men and women in the sex variable is arbitrary, and that arbitrary order determines whether we are doing men-women or women-men in the educational difference. If we put in the actual -5.7 into ttail, we would get a number close to 1, because ttail is looking at the probability from that value and greater.

display ttail(18536, -5.7164)

.99999999

* If we want the tail probability we subtract that from 1.

. display 1-ttail(18536, -5.7164)

5.524e-09

* and then multiply that single tail probability by 2:

. display 2*(1-ttail(18536, -5.7164))

1.105e-08

. log close

name:  <unnamed>