. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

 

--------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

     Male |       9,027     13.31212     2.967666     .0312351

   Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

 

. display 2.967666/sqrt(9027)

.03123513

 

* Key point: standard error of the mean is sd/(sqrt(n))

 

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* Note: there are two kinds of t-tests, equal variance and unequal variance. Stata assumes the equal variance version unless you tell it otherwise (as I do below)

 

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

*In the case of men and women’s educations, the variance and standard deviations of the two groups are so similar, that it makes hardly any difference whether you use the equal or the unequal variance t-test. But in HW2 in some cases it will matter. See my excel file and my PDF file for more information about the t-tests, how they are calculated, and how the standard error of the difference (the denominator of the t-statistic) is calculated, and how df are calculated.

 

. display ttail(18536,-5.7164)

.99999999

 

* Keep in mind that Stata’s ttail function gives you the right hand cumulative distribution, so if you start with a negative statistic, you get a value very close to 1.

 

. display (1-ttail(18356,-5.7164))

5.525e-09

 

*This gives us the tail probability

 

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

 

* And this above gives us the probability of the 2 tails added together.

 

*Now let’s generate a proper dummy variable for gender, that we can use as an input in a regression.

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. codebook sex

 

----------------------------------------------------------------------------------------

sex                                                                                  Sex

----------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  sexlbl

 

                 range:  [1,2]                        units:  1

         unique values:  2                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                         64791         1  Male

                         68919         2  Female

 

. gen byte female=0

female already defined

r(110);

 

. replace female=1 if sex==2

(0 real changes made)

 

. tabulate sex female

 

           |        female

       Sex |         0          1 |     Total

-----------+----------------------+----------

      Male |    64,791          0 |    64,791

    Female |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

* When you generate a new variable, it is always important to cross tabulate it with the old variable. Here our new female variable is 0-1, rather than 1-2.

 

 

. regress yrsed female if age>=25&age<=34

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

       _cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

 

* The OLS regression gives us a t-test exactly equal to the equal variance t-test.

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. display 2*(ttail(18356, 5.7164))

1.105e-08

 

. display 2*(1-normal(5.7164))

1.088e-08

 

* Because 18K degrees of freedom is a lot of degrees of freedom, the t and Normal distributions are almost exactly the same at the given statistic value (but notice the 1-Normal because Stata codes Normal for left hand cumulative distribution, while T is coded for right hand cumulative distribution (for an arbitrary reason I cannot guess). Stata help is your guide to syntax.

 

. display invnormal(1-.025)

1.959964

 

*1.96 is the key value of the Normal distribution. How many degrees of freedom do we need to have before the T distribution gets close to the Normal distribution in terms of yielding the same critical value associated with 2.5% single tail probability?

 

. display invttail(2, 0.025)

4.3026527

 

. display invttail(10, 0.025)

2.2281389

 

. display invttail(25, 0.025)

2.0595386

 

. display invttail(50, 0.025)

2.0085591

 

. display invttail(100, 0.025)

1.9839715

 

. display invttail(1000, 0.025)

1.9623391

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2014_logs\cl

> ass4.log

  log type:  text

 closed on:   1 Oct 2014, 12:32:36

----------------------------------------------------------------------------------------