---------------------------------------------------------------------------------------

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2016_logs\c

> lass3.log

  log type:  text

 opened on:   3 Oct 2016, 10:06:28

 

. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear

 

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

 

--------------------------------------------------------------

      Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

     Male |       9,027     13.31212     2.967666     .0312351

   Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

 

. display 2.967666/sqrt(9027)

.03123513

* SE=SD/(sqrt(n))

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

 

Two-sample t test with unequal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* Note that we have two different kinds of t-tests, equal and unequal variance t-tests. In this particular case the difference between them (see t-statistics) is very small because in actuality, mens’ educational variance and womens’ educational variance is almost exactly the same, so assuming the variances are the same makes hardly any difference.

 

. display -.2444469/.0428057

-5.7106156

 

* And note T=diff/(SE of diff)

 

. display ttail(18536,-5.7164)

.99999999

 

. display (1-ttail(18356,-5.7164))

5.525e-09

 

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

 

* in order to get the probability for the tails, you have to know whether the function in question (in this case ttail) gives you the cumulative probability up to t, or from t to infinity. In the case of ttail it is the probability of t to infinity (look it up in Stata help) which means for -5.7 we have to do 1- probability to get the tail.

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. codebook sex

 

-------------------------------------------------------------------------------------------------------------------------

sex                                                                                                                   Sex

-------------------------------------------------------------------------------------------------------------------------

 

                  type:  numeric (byte)

                 label:  sexlbl

 

                 range:  [1,2]                        units:  1

         unique values:  2                        missing .:  0/133710

 

            tabulation:  Freq.   Numeric  Label

                         64791         1  Male

                         68919         2  Female

 

* Now I am going to generate a “dummy” variable, i.e. a 0-1 coded variable for female, and enter the dummy variable into the regression predicting years of education.

 

. gen byte female=0

 

. replace female=1 if sex==2

(68919 real changes made)

 

. tabulate sex female

 

           |        female

       Sex |         0          1 |     Total

-----------+----------------------+----------

      Male |    64,791          0 |    64,791

    Female |         0     68,919 |    68,919

-----------+----------------------+----------

     Total |    64,791     68,919 |   133,710

 

* always cross tabulate your new variable with your old, to make sure everything is OK.

 

 

. regress yrsed female if age>=25 & age<=34

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

       _cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

 

* The regression gives us a coefficient, standard error, and t-stat exactly like the equal variance t-test.

 

* to get the t-stat in more decimal places, take the variance covariance matrix created by the regression, and take the first item, and take its square root.

 

. matrix var_covar_regress=e(V)

 

. matrix list var_covar_regress

 

symmetric var_covar_regress[2,2]

            female       _cons

female   .00182861

 _cons  -.00093818   .00093818

 

. display var_covar_regress[1,1]^0.5

.04276226

 

. display 0.2444469/0.04276226

5.7164168

 

* That is our t-statistic in more detail. Compare to the equal variance t-test:

 

. ttest yrsed if age>=25 & age<=34, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

* Now compare t distribution and Normal distributions:

 

. display 2*(ttail(18356, 5.7164))

1.105e-08

 

. display 2*(1-normal(5.7164))

1.088e-08

 

. display invnormal(1-.025)

1.959964

 

. display invttail(2, 0.025)

4.3026527

 

. display invttail(10, 0.025)

2.2281389

 

. display invttail(100, 0.025)

1.9839715

 

. display invttail(1000, 0.025)

1.9623391

 

. display invttail(18000, 0.025)

1.9600958

 

. log close

      name:  <unnamed>

       log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2016_logs\class3.lo

> g

  log type:  text

 closed on:   3 Oct 2016, 12:59:22

-----------------------------------------------------------------------------------------------