------------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class3

> .log

  log type:  text

 opened on:  27 Sep 2021, 10:01:19

 

. use "C:\Users\mexmi\Desktop\cps_mar_2000_new.dta"

 

 

. *class starts here

 

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

 

--------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

     Male |       9,027     13.31212     2.967666     .0312351

   Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

 

*So what is the relationship between SD, n and SEM (standard error of the mean)? SEM=sd/(sqrt(n))

 

. display 2.967666/sqrt(9027)

.03123513

 

* True!

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |   9,027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |   9,511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |  18,538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

*The other key relationship we want to verify is that t=(diff of means)/(SE of the diff of means).

 

. display -0.2444469/0.0427623

-5.7164114

 

* Again true!

 

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |   9,027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |   9,511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |  18,538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

 

* In today’s lecture I talked about the fact that there are two different ways (actually there are many ways but I mentioned 2) ways of calculating the SE of the mean of the difference in ttests: the equal variance assumption (the default) and the unequal variance assumption. In this case the equal and unequal variance assumptions yield very similar results (similar SE of the difference, similar t-stats) because the sd of men’s and women’s educations are in fact very similar. So the equal variance assumption is very close to true.

 

*Then we looked at Rice’s T and Normal tables.

 

. display ttail(18536,-5.7164)

.99999999

 

* The ttail function generates a probability based on a t statistic and degrees of freedom. The ttail function goes from stat to positive infinity. If we want the tail of the negative value, we have to subtract it from 1, and that is what we do below:

 

. display (1-ttail(18356,-5.7164))

5.525e-09

 

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

 

*And then we multiply by 2 because we care about the tails in both directions: the two tailed test.

 

* How does the 2 tail probability in the t distribution with 18K degrees of freedom compare to the Normal distribution at the same value of the statistics: 5.7?

 

. display 2*(ttail(18356, 5.7164))

1.105e-08

 

. display 2*(1-normal(5.7164))

1.088e-08

 

*The t probability is a tiny bit higher, but very darn close and the substantive meaning (<0.05) is the same. The t-distribution is fatter but at large n the difference hardly matters. Also note: for no particularly good reason, the ttail calculates cumulative probability to infinity and normal calculates cumulative probability from negative infinity. It’s arbitrary but it would be nice if they worked the same way.

 

* We can look at this the other way: what statistic corresponds to a single tail probability of 2.5%?

 

. display invnormal(1-.025)

1.959964

 

. display invttail(2, 0.025)

4.3026527

 

* At df=2 the critical value of the t-distribution is higher than the Normal critical value.

 

. display invttail(100, 0.025)

1.9839715

 

* But at df=100 they are quite close.

 

. display invttail(10000, 0.025)

1.9602012

 

* and at df=10K, they are almost identical.

 

. log close

      name:  <unnamed>

       log:  C:\Users\mexmi\Documents\newer web pages\soc_meth_proj3\fall_2021_logs\class3

> .log

  log type:  text

 closed on:  27 Sep 2021, 13:04:58

------------------------------------------------------------------------------------------