---------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\2010_logs\fourth_class.log

log type:  text

opened on:   4 Feb 2010, 14:42:34

. display normal(2)

.97724987

*In the standard Normal distribution, only about 2.3% of the cumulative density remains beyond 2 standard deviations above zero.

. display 1-normal(2)

.02275013

. display 1-normal(1.96)

.0249979

* We are usually concerned with 5% significance levels, which is arbitrary but standard. Given that 2.5% of the distribution remains above 1.96 standard deviations above the mean, the critical value in Z-scores is about 1.96, because you could get a value of 1.96 or less (in either direction from zero) about 5% of the times with a Normal distribution.

. display 2*(1-normal(1.96))

.04999579

. display invnormal(1-.025)

1.959964

* The command normal takes a Z score and gives a probability on the cumulative density function, whereas invnormal takes a cumulative probability and gives you the corresponding Z score. Check Freedman’s tables. Also, use Stata help to look up the commands normal and invnormal to remind yourself of how they work. Stata online help is useful!

. display normal(5.716)

.99999999

. display 1-normal(5.716)

5.453e-09

. display 2*(1-normal(5.716))

1.091e-08

. * that number, 10 to the minus 8, or 1 in 100 million, is the chance that a normally distributed statistic would yield a value as high as 5.716, which is what we got by comparing men and women's years of education. Since this P value is so low, we can reject the null hypothesis of equal educations between the groups.

. display invnormal(1-.025)

1.959964

. *the critical 97.5% cumulative density point for the Normal distribution is at 1.96 standard deviations above the mean. For the T distribution the critical value depends on the degrees of freedom.

. display invttail(16,.025)

2.1199053

. display invttail(100,.025)

1.9839715

. display invttail(1800,.025)

1.9612828

. * as the degrees of freedom, which is just the sample size of the two groups increases, the T and Normal distributions come to be pretty much the same. Compare Freedman’s tables.

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta",

>  clear

. gen mos_education= yrsed*12

(30484 missing values generated)

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest  mos_education if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    159.7454    .3748215    35.61199    159.0107    160.4802

Female |    9511    162.6788    .3512319    34.25366    161.9903    163.3673

---------+--------------------------------------------------------------------

combined |   18538    161.2504    .2567052    34.95152    160.7472    161.7536

---------+--------------------------------------------------------------------

diff |           -2.933363    .5131471               -3.939178   -1.927547

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. *T statistic is not affected by changes in scale (that is by changes from yrsed to mos_education).

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

. exit

---------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web p

> ages\soc_meth_proj3\2010_logs\fourth_class.log

log type:  text

opened on:   4 Feb 2010, 15:18:51

. *That so far is where class ended, but I want to open this log back up to demonstrate a few additional things that I did not get to in class, things which are relevant to HW2

. use "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", clear

. *load up the 2000 CPS data again.

. *One thing I mentioned is that there are two different t-tests, one which assumes that the two subsamples have equal variance (the equal variance t-test), and one whil takes the actual variances of the two samples (be they similar or different).

. *In my excel file, I use and assume the unequal variance t-test.

. ttest yrsed if age>24 & age<35, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. * Stata assumes equal variance t-test, as above.

. ttest yrsed if age>24 & age<35, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. * The unequal variance t-test which we invoked by just using the word "unequal" after the comma actually gives us exactly what we got in the Excel file. In this case the equal and unequal t-tests are very similar because as you can see from the t-test table, the standard deviations of education for men and women are very similar.

. * Now let's look at a simple regression version of this.

. *First we generate a new dummy variable for gender.

. codebook sex

-------------------------------------------------------------------------------------------------

sex                                                                                           Sex

-------------------------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

. *OK, sex=1 for men and 2 for women. We are going to create a new variable that=1 for women and

> =0 otherwise.

. gen female=0

. replace female=1 if sex==2

. regress yrsed female if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

. * you can see here that the coefficient for female is 0.244, that is the additional educational attainment that women have, in the average, over men. The constant (13.31) corresponds to male education, i.e. the educational attainment when the variable female=0. The t-statistic corresponds to the equal variance t-test above.

. *This is the simple OLS regression, using gender to predict yrsed.

. regress yrsed male if age>24 & age<35

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

male |  -.2444469   .0427623    -5.72   0.000    -.3282649   -.1606289

_cons |   13.55657   .0298401   454.31   0.000     13.49808    13.61506

------------------------------------------------------------------------------

. *And if we use a dummy variable that=1 for men and =0 for women, we get the same result for coefficient and T-statistic but with signs reversed, and here the constant term=women's education.

. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace

file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved

. exit, clear