----------------------------------------------------------------------------------------------------------------------------------

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2015_381_logs\class4.log

log type:  text

opened on:  30 Sep 2015, 10:03:10

. *class starts here

. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)

--------------------------------------------------------------

Sex |       Freq.  mean(yrsed)    sd(yrsed)   sem(yrsed)

----------+---------------------------------------------------

Male |       9,027     13.31212     2.967666     .0312351

Female |       9,511     13.55657     2.854472     .0292693

--------------------------------------------------------------

*It is crucial to understand the relationship between sd, n, and standard error of the mean. Specifically, SE=SD/(sqrt(n))

. display 2.967666/sqrt(9027)

.03123513

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. ttest yrsed if age>=25 & age<=34, by(sex) unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0428057                 -.32835   -.1605438

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7106

Ho: diff = 0                     Satterthwaite's degrees of freedom =  18383.6

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

* One of the things to know, that will be relevant for HW2 but which is less relevant here, is that there are two different ways to calculate the standard error of the difference between the means. The equal variance t-test (the default option, above), assumes that men and women’s educational distributions have the same population variance and SD, which in this case is very nearly true, and the unequal variance t-test assumes that the SD of the two groups are different. Since in this case men and women’s educations have similar SD and SE, the unequal and equal variance t-test yield almost the exact same standard error of the difference and almost the exact same t-statistic.

. display ttail(18536,-5.7164)

.99999999

* ttail (df, t-stat) gives you the right hand cumulative tail probability. Since the statistic is a negative outlier (-5.72), ttail in this case yields almost 1. If we want the tail probability, we have to do 1-(t probability), so:

. display (1-ttail(18356,-5.7164))

5.525e-09

* And generally we want both tails, so:

. display 2*(1-ttail(18356,-5.7164))

1.105e-08

*Another key relationship to always keep in mind: T=Coef/SE

. display -0.2444469/0.0428057

-5.7106156

*OK, now we want to run the regression version of the above t-test, and first we need to create a dummy variable for gender. Stata has several built-in ways that we will talk about. For now, I will generate the dummy variable “female” by hand, so you can see how it works.

. tabulate sex

Sex |      Freq.     Percent        Cum.

------------+-----------------------------------

Male |     64,791       48.46       48.46

Female |     68,919       51.54      100.00

------------+-----------------------------------

Total |    133,710      100.00

. codebook sex

--------------------------------------------------------------------------------

sex                                                                          Sex

--------------------------------------------------------------------------------

type:  numeric (byte)

label:  sexlbl

range:  [1,2]                        units:  1

unique values:  2                        missing .:  0/133710

tabulation:  Freq.   Numeric  Label

64791         1  Male

68919         2  Female

. gen byte female=0

. replace female=1 if sex==2

(68919 real changes made)

. tabulate sex female

|        female

Sex |         0          1 |     Total

-----------+----------------------+----------

Male |    64,791          0 |    64,791

Female |         0     68,919 |    68,919

-----------+----------------------+----------

Total |    64,791     68,919 |   133,710

*When you make a new variable, always cross tabulate the new with the old, to make sure it does what you want.

. regress yrsed female if age>=25&age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

. * The t-statistic here (and the coefficient for female impact on education and its standard error) are exactly the same as our equal variance t-test. Regression assumes equal variance, also known as the assumption of homoscedasticity. Also note that there is a constant term here, and a t-test associated with the constant. What is the null hypothesis of the constant term? The null hypothesis for the constant is a ridiculous one. The constant term in this case represents men’s average education, so the null hypothesis would be that men have zero education, which of course is

. ttest yrsed if age>=25 & age<=34, by(sex)

Two-sample t test with equal variances

------------------------------------------------------------------------------

Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]

---------+--------------------------------------------------------------------

Male |    9027    13.31212    .0312351    2.967666    13.25089    13.37335

Female |    9511    13.55657    .0292693    2.854472    13.49919    13.61394

---------+--------------------------------------------------------------------

combined |   18538    13.43753    .0213921    2.912627     13.3956    13.47946

---------+--------------------------------------------------------------------

diff |           -.2444469    .0427623               -.3282649   -.1606289

------------------------------------------------------------------------------

diff = mean(Male) - mean(Female)                              t =  -5.7164

Ho: diff = 0                                     degrees of freedom =    18536

Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0

Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

. display 2*(ttail(18356, 434.6))

0

*When Stata says that the tail probability of the t distribution for values higher than 434.6 is zero, what Stata means is that the value is too small to calculate or too small to display, the number must be higher than zero.

. display 2*(ttail(18356, 5.7164))

1.105e-08

. display 2*(1-normal(5.7164))

1.088e-08

* Even with 18K degrees of freedom, the t-distribution tail probability is a little bit higher than the Normal tail probability.

. regress yrsed female if age>=25&age<=34

Source |       SS       df       MS              Number of obs =   18538

-------------+------------------------------           F(  1, 18536) =   32.68

Model |  276.742433     1  276.742433           Prob > F      =  0.0000

Residual |  156979.922 18536  8.46892111           R-squared     =  0.0018

-------------+------------------------------           Adj R-squared =  0.0017

Total |  157256.664 18537  8.48339343           Root MSE      =  2.9101

------------------------------------------------------------------------------

yrsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

female |   .2444469   .0427623     5.72   0.000     .1606289    .3282649

_cons |   13.31212   .0306297   434.62   0.000     13.25208    13.37216

------------------------------------------------------------------------------

. *If we want the t-statistic expressed in a totally unreasonable accuracy, Stata stores the coefficients and the variance-covariance matrices in detail more than the detail of the output. Type in help regress, and you will get a description of what the saved output scalars, vectors, and matrices are.

. matrix var_covar_regress=e(V)

* e(V) stores the variance covariance matrix of the last regression.

. matrix list var_covar_regress

symmetric var_covar_regress[2,2]

female       _cons

female   .00182861

_cons  -.00093818   .00093818

* The first element in this matrix is the variance of the female coefficient, or the square of the SE of the female coefficient.

. display var_covar_regress[1,1]^0.5

.04276226

. display 0.2444469/0.04276226

5.7164168

. display invnormal(1-.025)

1.959964

* The Normal distribution value that yields a tail probability of 0.025 is Z=1.96. How many degrees of freedom do we need before the T distribution value for tail probability of 0.025 gets close to 1.96

. display invttail(2, 0.025)

4.3026527

. display invttail(10, 0.025)

2.2281389

. display invttail(100, 0.025)

1.9839715

. display invttail(1000, 0.025)

1.9623391

. display invttail(18000, 0.025)

1.9600958

. log close

name:  <unnamed>

log:  C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_201

> 5_381_logs\class4.log

log type:  text

closed on:  30 Sep 2015, 13:44:51

-------------------------------------------------------------------------------