-----------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\class3_2009.log
log type: text
opened on: 3 Feb 2009, 11:35:54
. set mem 200m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 200M max. data space 200.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
203.163M
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
. table sex if age>24 & age<35, contents(mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.31212 2.967666 9,027
Female | 13.55657 2.854472 9,511
-------------------------------------------------
. *For the moment, we are ignoring the weights.
. *Is this difference significant? Do we really believe that women have more education?
. sort sex
. by sex: summarize yrsed if age>24 & age<35
-> sex = Male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9027 13.31212 2.967666 0 17
-> sex = Female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
yrsed | 9511 13.55657 2.854472 0 17
. ttest yrsed if age >24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 13.31212 .0312351 2.967666 13.25089 13.37335
Female | 9511 13.55657 .0292693 2.854472 13.49919 13.61394
---------+--------------------------------------------------------------------
combined | 18538 13.43753 .0213921 2.912627 13.3956 13.47946
---------+--------------------------------------------------------------------
diff | -.2444469 .0427623 -.3282649 -.1606289
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. tabulate sex
Sex | Freq. Percent Cum.
------------+-----------------------------------
Male | 64,791 48.46 48.46
Female | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. tabulate sex, nolab
Sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 64,791 48.46 48.46
2 | 68,919 51.54 100.00
------------+-----------------------------------
Total | 133,710 100.00
. gen byte male=0
. replace male=1 if sex==1
(64791 real changes made)
* Here we are creating a dummy variable for gender, a new variable which =0 for women and =1 for men. Why is this better than sex variable we had before, coded 1 and 2? Well, Stata will assume (unless told otherwise) that any variable entered into a regression is a continuous variable. So categorical variables have to be coded into 0-1 dummy variables. There are some functions for doing this automatically within Stata...
. regress yrsed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 276.742433 1 276.742433 Prob > F = 0.0000
Residual | 156979.922 18536 8.46892111 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 157256.664 18537 8.48339343 Root MSE = 2.9101
------------------------------------------------------------------------------
yrsed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -.2444469 .0427623 -5.72 0.000 -.3282649 -.1606289
_cons | 13.55657 .0298401 454.31 0.000 13.49808 13.61506
------------------------------------------------------------------------------
. *This is the same contrast, difference of .2444, std error of .04276, and T-statistic of 5.72, in ttest and regression form, that we calculated by hand in our excel spreadsheet.
. gen months_ed=yrsed*12
(30484 missing values generated)
. *If I do the same analysis with months of education instead of years of education, will I get the answer? How will results differ?
. table sex if age>24 & age<35, contents(mean months_ed sd months_ed freq)
----------------------------------------------------------
Sex | mean(months~d) sd(months~d) Freq.
----------+-----------------------------------------------
Male | 159.7454 35.61199 9,027
Female | 162.6788 34.25366 9,511
----------------------------------------------------------
*Population means and std deviations go up by a factor of 12.
. ttest months_ed if age >24 & age<35, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 9027 159.7454 .3748215 35.61199 159.0107 160.4802
Female | 9511 162.6788 .3512319 34.25366 161.9903 163.3673
---------+--------------------------------------------------------------------
combined | 18538 161.2504 .2567052 34.95152 160.7472 161.7536
---------+--------------------------------------------------------------------
diff | -2.933363 .5131471 -3.939178 -1.927547
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -5.7164
Ho: diff = 0 degrees of freedom = 18536
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
. *T-statistic is invariant to changes of scale (we get the same 5.716)
. regress months_ed male if age>24 & age<35
Source | SS df MS Number of obs = 18538
-------------+------------------------------ F( 1, 18536) = 32.68
Model | 39850.9104 1 39850.9104 Prob > F = 0.0000
Residual | 22605108.7 18536 1219.52464 R-squared = 0.0018
-------------+------------------------------ Adj R-squared = 0.0017
Total | 22644959.6 18537 1221.60865 Root MSE = 34.922
------------------------------------------------------------------------------
months_ed | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | -2.933363 .5131471 -5.72 0.000 -3.939178 -1.927547
_cons | 162.6788 .3580818 454.31 0.000 161.9769 163.3807
------------------------------------------------------------------------------
. exit, clear