-----------------------------------------------------------------------------------------------------

name:  <unnamed>

log type:  text

opened on:   1 Oct 2018, 10:12:32

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta", clear

. *class starts here

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

Male |       9,027     13.31212     2.967666     .0312351

Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

. display 2.967666/sqrt(9027)

.03123513

·        SE=SD/(sqrt(n)). This is a crucial relationship, one that you need to know.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. display -0.2444469/0.0427623

-5.7164114

·        Also, and crucially, T=difference/(SE of difference)

. display ttail(18536,-5.7164)

.99999999

·        Lookup the stata function ttail. It gives the probability from that value and higher, so in this case -5.7 and above. We want tail value, so we want either 1-ttail(df, -5.7) or ttail (df, 5.7).

. display (1-ttail(18356,-5.7164))

5.525e-09

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

. display 2*(ttail(18356, 5.7164))

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* Normal probability takes the cumulative probability up to that point, which is (for no logical reason) the opposite of ttail syntax. So here we have 1-normal(5.7), times 2 because we have two tails. And note, the normal probability is similar to, but not exactly the same as the t probability with 18K degrees of freedom.

. display invnormal(1-.025)

1.959964

* 1.96 is the critical value of the normal, with upper and lower tails adding up to 5% probability.

. display invttail(2, 0.025)

4.3026527

* For small df, the t distribution critical value is much higher, but as df grows, the t distribution becomes indistinguishable from the normal.

. display invttail(100, 0.025)

1.9839715

. display invttail(10000, 0.025)

1.9602012

. log close

name:  <unnamed>