---------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2016_logs\c
> lass3.log
log type: text
opened on: 3 Oct 2016, 10:06:28
. use "C:\Users\Michael\Desktop\cps_mar_2000_new_unchanged.dta", clear
. table sex if age>=25 & age<=34, contents (freq mean yrsed sd yrsed semean yrsed)
--------------------------------------------------------------
Sex | Freq. mean(yrsed) sd(yrsed) sem(yrsed)
----------+---------------------------------------------------
Male | 9,027 13.31212 2.967666 .0312351
Female | 9,511 13.55657 2.854472 .0292693
--------------------------------------------------------------
. display 2.967666/sqrt(9027)
.03123513
* SE=SD/(sqrt(n))
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. ttest yrsed if age>=25 & age<=34, by(sex) unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0428057 -.32835 -.1605438
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7106
Ho: diff = 0 Satterthwaite's degrees of freedom = 18383.6
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Note that we have two different kinds of t-tests, equal and unequal variance t-tests. In this particular case the difference between them (see t-statistics) is very small because in actuality, mens’ educational variance and womens’ educational variance is almost exactly the same, so assuming the variances are the same makes hardly any difference.
. display -.2444469/.0428057
-5.7106156
* And note T=diff/(SE of diff)
. display ttail(18536,-5.7164)
.99999999
. display (1-ttail(18356,-5.7164))
5.525e-09
. display 2*(1-ttail(18356,-5.7164))
1.105e-08
* in order to get the probability for the tails, you have to know whether the function in question (in this case ttail) gives you the cumulative probability up to t, or from t to infinity. In the case of ttail it is the probability of t to infinity (look it up in Stata help) which means for -5.7 we have to do 1- probability to get the tail.
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. codebook sex
-------------------------------------------------------------------------------------------------------------------------
sex Sex
-------------------------------------------------------------------------------------------------------------------------
type: numeric (byte)
label: sexlbl
range: [1,2] units: 1
unique values: 2 missing .: 0/133710
tabulation: Freq. Numeric Label
64791 1 Male
68919 2 Female
* Now I am going to generate a “dummy” variable, i.e. a 0-1 coded variable for female, and enter the dummy variable into the regression predicting years of education.
. gen byte female=0
. replace female=1 if sex==2
(68919 real changes made)
. tabulate sex female
| female
Sex | 0 1 | Total
-----------+----------------------+----------
Male | 64,791 0 | 64,791
Female | 0 68,919 | 68,919
-----------+----------------------+----------
Total | 64,791 68,919 | 133,710
* always cross tabulate your new variable with your old, to make sure everything is OK.
. regress yrsed female if age>=25 & age<=34
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .2444469 .0427623 5.72 0.000 .1606289 .3282649
_cons | 13.31212 .0306297 434.62 0.000 13.25208 13.37216
------------------------------------------------------------------------------
* The regression gives us a coefficient, standard error, and t-stat exactly like the equal variance t-test.
* to get the t-stat in more decimal places, take the variance covariance matrix created by the regression, and take the first item, and take its square root.
. matrix var_covar_regress=e(V)
. matrix list var_covar_regress
symmetric var_covar_regress[2,2]
female _cons
female .00182861
_cons -.00093818 .00093818
. display var_covar_regress[1,1]^0.5
.04276226
. display 0.2444469/0.04276226
5.7164168
* That is our t-statistic in more detail. Compare to the equal variance t-test:
. ttest yrsed if age>=25 & age<=34, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
* Now compare t distribution and Normal distributions:
. display 2*(ttail(18356, 5.7164))
1.105e-08
. display 2*(1-normal(5.7164))
1.088e-08
. display invnormal(1-.025)
1.959964
. display invttail(2, 0.025)
4.3026527
. display invttail(10, 0.025)
2.2281389
. display invttail(100, 0.025)
1.9839715
. display invttail(1000, 0.025)
1.9623391
. display invttail(18000, 0.025)
1.9600958
. log close
name: <unnamed>
log: C:\Users\Michael\Documents\newer web pages\soc_meth_proj3\fall_2016_logs\class3.lo
> g
log type: text
closed on: 3 Oct 2016, 12:59:22
-----------------------------------------------------------------------------------------------