------------------------------------------------------------------------------------------

name: <unnamed>

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class3

> .log

log type: text

opened on: 27 Sep 2021, 10:01:19

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"

. *class starts here

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)

----------+---------------------------------------------------

Male | 9,027 13.31212 2.967666 .0312351

Female | 9,511 13.55657 2.854472 .0292693

--------------------------------------------------------------

*So what is the relationship between SD, n and SEM (standard error of the mean)? SEM=sd/(sqrt(n))

. display 2.967666/sqrt(9027)

.03123513

* True!

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9,027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9,511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18,538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0427623 -.3282649 -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7164

Ho: diff = 0 degrees of freedom = 18536

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

*The other key relationship we want to verify is that t=(diff of means)/(SE of the diff of means).

. display -0.2444469/0.0427623

-5.7164114

* Again true!

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male | 9,027 13.31212 .0312351 2.967666 13.25089 13.37335

Female | 9,511 13.55657 .0292693 2.854472 13.49919 13.61394

---------+--------------------------------------------------------------------

combined | 18,538 13.43753 .0213921 2.912627 13.3956 13.47946

---------+--------------------------------------------------------------------

diff | -.2444469 .0428057 -.32835 -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female) t = -5.7106

Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

* In today’s lecture I talked about the fact that there are two different ways (actually there are many ways but I mentioned 2) ways of calculating the SE of the mean of the difference in ttests: the equal variance assumption (the default) and the unequal variance assumption. In this case the equal and unequal variance assumptions yield very similar results (similar SE of the difference, similar t-stats) because the sd of men’s and women’s educations are in fact very similar. So the equal variance assumption is very close to true.

*Then we looked at Rice’s T and Normal tables.

. display ttail(18536,-5.7164)

.99999999

* The ttail function generates a probability based on a t statistic and degrees of freedom. The ttail function goes from stat to positive infinity. If we want the tail of the negative value, we have to subtract it from 1, and that is what we do below:

. display (1-ttail(18356,-5.7164))

5.525e-09

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

*And then we multiply by 2 because we care about the tails in both directions: the two tailed test.

* How does the 2 tail probability in the t distribution with 18K degrees of freedom compare to the Normal distribution at the same value of the statistics: 5.7?

. display 2*(ttail(18356, 5.7164))

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

*The t probability is a tiny bit higher, but very darn close and the substantive meaning (<0.05) is the same. The t-distribution is fatter but at large n the difference hardly matters. Also note: for no particularly good reason, the ttail calculates cumulative probability to infinity and normal calculates cumulative probability from negative infinity. It’s arbitrary but it would be nice if they worked the same way.

* We can look at this the other way: what statistic corresponds to a single tail probability of 2.5%?

. display invnormal(1-.025)

1.959964

. display invttail(2, 0.025)

4.3026527

* At df=2 the critical value of the t-distribution is higher than the Normal critical value.

. display invttail(100, 0.025)

1.9839715

* But at df=100 they are quite close.

. display invttail(10000, 0.025)

1.9602012

* and at df=10K, they are almost identical.

. log close

name: <unnamed>

log: C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class3

> .log

log type: text

closed on: 27 Sep 2021, 13:04:58

------------------------------------------------------------------------------------------