class4 log

--------------------------------------------------------------------------------------------------

name: <unnamed>

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\Soc180B_spr2019_logs\class4_log.log

log type: text

opened on: 11 Apr 2019, 14:34:47

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta", clear

*Always start with a log, and then open the data.

. *class starts here

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

*This is our friendly t-test comparing the educational attainment of men and women 25-34. Among the things I want you to recognize, is how t-test is related to the difference, and the standard error of that difference:

. display -.2444469/0.0427623

-5.7164114

* For individual distributions, the standard error= SD/(sqrt(n))

. *standard error= SD/sqrt(n)

. display 2.967666/(9027^0.5)

.03123513

* If we don’t specify, Stata gives us the equal variance ttest. If we specify “unequal” after the comma, we get the unequal variance ttest, which in this case is only very slightly different. The difference here below (-5.7106 compared -5.7164 above in the equal variance ttest) is very similar because the actual SDs of men and women’s education are nearly the same, as you can see from the table:

. ttest yrsed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0428057 -.32835 -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7106

Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

*The middle of the 3 P values above, Pr(|T|> |t|) is the one we care about, that is the probability that T statistic with expected value of zero would end up, just by chance, with a value as far from zero as -5.7. In the above output, Stata lists that probability as “0.0000” but we want to know more, because the actual probability has to be larger than zero. So how large is it?

. display ttail(18000, 5.7164)

5.526e-09

. display ttail(18536, 5.7164)

5.524e-09

* ttail(df, t) is the syntax of the ttail command, and it produces a probability of getting a t-statistic that large or larger, in a one tail test. Once you get into large degrees of freedom, here 18000, the exact number doesn’t really matter, but we might as well report it correctly: 18536 df,

* And we want to multiply the P value by 2, because the t-distribution has 2 tails. We end up with a value P=1.105x10^-8

. display 2*ttail(18536, 5.7164)

1.105e-08

* also note: the sign of the t-statistic is arbitrary here, because the order of men and women in the sex variable is arbitrary, and that arbitrary order determines whether we are doing men-women or women-men in the educational difference. If we put in the actual -5.7 into ttail, we would get a number close to 1, because ttail is looking at the probability from that value and greater.

display ttail(18536, -5.7164)

.99999999

* If we want the tail probability we subtract that from 1.

. display 1-ttail(18536, -5.7164)

5.524e-09

* and then multiply that single tail probability by 2:

. display 2*(1-ttail(18536, -5.7164))

1.105e-08

. log close

name: <unnamed>

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\Soc180B_spr2019_logs\class4_l

> og.log

log type: text

closed on: 11 Apr 2019, 16:24:45

--------------------------------------------------------------------------------------------------