-----------------------------------------------------------------------------------

       log:  C:\AAA Miker Files\newer web pages\soc_meth_proj3\class3_2009.log

  log type:  text

 opened on:   3 Feb 2009, 11:35:54

 

. set mem 200m

 

Current memory allocation

 

                    current                                 memory usage

    settable          value     description                 (1M = 1024k)

    --------------------------------------------------------------------

    set maxvar         5000     max. variables allowed           1.909M

    set memory          200M    max. data space                200.000M

    set matsize         400     max. RHS vars in models          1.254M

                                                            -----------

                                                               203.163M

 

. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear

 

. table sex if age>24 & age<35, contents(mean yrsed sd yrsed freq)

 

-------------------------------------------------

      Sex | mean(yrsed)    sd(yrsed)        Freq.

----------+--------------------------------------

     Male |    13.31212     2.967666        9,027

   Female |    13.55657     2.854472        9,511

-------------------------------------------------

 

. *For the moment, we are ignoring the weights.

 

. *Is this difference significant? Do we really believe that women have more education?

 

. sort sex

 

. by sex: summarize yrsed if age>24 & age<35

 

 

-> sex = Male

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9027    13.31212    2.967666          0         17

 

-> sex = Female

 

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

       yrsed |      9511    13.55657    2.854472          0         17

 

 

. ttest yrsed if age >24 & age<35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

  Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

    diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. tabulate sex

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

       Male |     64,791       48.46       48.46

     Female |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. tabulate sex, nolab

 

        Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |     64,791       48.46       48.46

          2 |     68,919       51.54      100.00

------------+-----------------------------------

      Total |    133,710      100.00

 

. gen byte male=0

 

. replace male=1 if sex==1

(64791 real changes made)

 

* Here we are creating a dummy variable for gender, a new variable which =0 for women and =1 for men. Why is this better than sex variable we had before, coded 1 and 2? Well, Stata will assume (unless told otherwise) that any variable entered into a regression is a continuous variable. So categorical variables have to be coded into 0-1 dummy variables. There are some functions for doing this automatically within Stata...

 

. regress yrsed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  276.742433     1  276.742433           Prob > F      =  0.0000

    Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

 

------------------------------------------------------------------------------

       yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

       _cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

 

. *This is the same contrast, difference of .2444, std error of .04276, and T-statistic of 5.72, in ttest and regression form, that we calculated by hand in our excel spreadsheet.

 

. gen months_ed=yrsed*12

(30484 missing values generated)

 

. *If I do the same analysis with months of education instead of years of education, will I get the answer? How will results differ?

 

. table sex if age>24 & age<35, contents(mean months_ed sd months_ed freq)

 

----------------------------------------------------------

      Sex | mean(months~d)    sd(months~d)           Freq.

----------+-----------------------------------------------

     Male |       159.7454        35.61199           9,027

   Female |       162.6788        34.25366           9,511

----------------------------------------------------------

 

*Population means and std deviations go up by a factor of 12.

 

. ttest months_ed if age >24 & age<35, by(sex)

 

Two-sample t test with equal variances

------------------------------------------------------------------------------

   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

    Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

  Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

    diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

    diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

 

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

 

. *T-statistic is invariant to changes of scale (we get the same 5.716)

 

. regress months_ed male if age>24 & age<35

 

      Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

       Model |  39850.9104     1  39850.9104           Prob > F      =  0.0000

    Residual |  22605108.7 18536  1219.52464           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

       Total |  22644959.6 18537  1221.60865           Root MSE      =  34.922

 

------------------------------------------------------------------------------

   months_ed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

        male |  -2.933363   .5131471    -5.72   0.000    -3.939178   -1.927547

       _cons |   162.6788   .3580818   454.31   0.000     161.9769    163.3807

------------------------------------------------------------------------------

 

. exit, clear