--------------------------------------------------------------------------

name:  <unnamed>

> l_2013_381_logs\class4.log

log type:  text

opened on:   3 Oct 2013, 13:37:20

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

Male |       9,027     13.31212     2.967666     .0312351

Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

. display 2.967666/(sqrt(9027))

.03123513

* Note that the standard error of the mean is just the standard deviation divided by the square root of N.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* There are two t-tests, the equal and the unequal. You use the option “unequal” to get the unequal variance t-test, otherwise Stata gives you the equal variance t-test. And note: in this case, the test statistic and the degrees of freedom are almost identical, because the underlying variance of men’s and women’s educations, and the N of the two samples, is so similar.

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. display ttail(18536,-5.7164)

.99999999

* the stata function ttail(df, t)  gives you the right hand tail probability, which in this case is the probability of all values larger than -5.7164. If you want the tail probability, you need 1-P, and if you want the two tail probability, you need 2*(1-P). But note that if we had done women compared to men, rather than men compared to women, we would have had a positive 5.7164 statistic, and we wouldn’t have had to do the “one minus P” part.

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

* In order to generate a regression version of the above t-test, we need first to generate a 0-1 dummy variable for gender.

. tabulate sex

Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

Male |     64,791       48.46       48.46

Female |     68,919       51.54      100.00

------------+-----------------------------------

Total |    133,710      100.00

. codebook sex

-----------------------------------------------------------------------------------

sex                                                                             Sex

-----------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

. gen byte female=0

. replace female=1 if sex==2

. tabulate sex female

|        female

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |    64,791          0 |    64,791

Female |         0     68,919 |    68,919

-----------+----------------------+----------

Total |    64,791     68,919 |   133,710

. regress yrsed female if age>=25&age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

* The t-test for female here is identical to the equal variance t-test above (check the standard error of the estimate to be sure it matches exactly, since T= coeff/Std Error. The null hypothesis is that gender does not influence years of education, the T-statistic allows us to reject that null hypothesis because the probability associated with that test is very small, about 1 in 100 million. In other words, if men and women in the US had the same levels of education, the chance of getting a difference this large (0.244) just by chance in a sample this big is 1 in 100 million. Since that chance is small, we reject the null hypothesis.

* And note that in these regression results we have a second test, the test of the constant term (t=434). The null hypothesis of the second test is that the constant is zero. Since the constant here is men’s average education, that second hull hypothesis is a dopey one we are happy to reject.

. display 2*(ttail(18356, 5.7164))

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* The normal 2 tail probability associated with 5.7164 is a tiny bit smaller than the T- probability. T-distribution with 18000 df is very close to Normal, but not exactly the same.

. display invnormal(1-.025)

1.959964

* The key value of the normal distribution is 1.96, that is the value at which the tail distribution has P=0.25, meaning two tails yield P=5%. Anything that is less than 5% likely we deem (arbitrarily) to be too unlikely to have happened by chance.

* T-statistics that yield the same tail probability are always larger than the Normal statistic, but the difference only matters for very small N.

. display invttail(2, 0.025)

4.3026527

. display invttail(10, 0.025)

2.2281389

. display invttail(25, 0.025)

2.0595386

. display invttail(1000, 0.025)

1.9623391

. log close

name:  <unnamed>