-----------------------------------------------------------------------------------
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\clas4_2009.log
log type: text
opened on: 10 Feb 2009, 11:16:26
*Actually, this was class 5... So I renamed the log later.
. set mem 200m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 200M max. data space 200.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
203.163M
. use "C:\AAA Miker Files\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
. tabulate metro
Metropolitan central city |
status | Freq. Percent Cum.
----------------------------+-----------------------------------
Not identifiable | 340 0.25 0.25
Not in metro area | 29,658 22.18 22.44
Central city | 32,481 24.29 46.73
Outside central city | 51,468 38.49 85.22
Central city status unknown | 19,763 14.78 100.00
----------------------------+-----------------------------------
Total | 133,710 100.00
. tabulate metro, nolab
Metropolita |
n central |
city status | Freq. Percent Cum.
------------+-----------------------------------
0 | 340 0.25 0.25
1 | 29,658 22.18 22.44
2 | 32,481 24.29 46.73
3 | 51,468 38.49 85.22
4 | 19,763 14.78 100.00
------------+-----------------------------------
Total | 133,710 100.00
. table metro if age>29 & age<65 & sex==1, contents (mean incwage)
-------------------------------------------
Metropolitan central city |
status | mean(incwage)
----------------------------+--------------
Not identifiable | 31743.04255
Not in metro area | 27189.6465
Central city | 34445.35841
Outside central city | 43203.0348
Central city status unknown | 35557.95997
-------------------------------------------
. *suburbs have the highest income, rural has the lowest, city is somewhere in between.
. xi i.metro
i.metro _Imetro_0-4 (naturally coded; _Imetro_0 omitted)
. table metro, contents (mean _Imetro_1 mean _Imetro_2 mean _Imetro_3)
----------------------------------------------------------------------------
Metropolitan central city |
status | mean(_Imetr~1) mean(_Imetr~2) mean(_Imetr~3)
----------------------------+-----------------------------------------------
Not identifiable | 0 0 0
Not in metro area | 1 0 0
Central city | 0 1 0
Outside central city | 0 0 1
Central city status unknown | 0 0 0
----------------------------------------------------------------------------
. *I want to change the comparison group for these dummy variables to the rural. Because the first category was basically empty.
. char metro[omit] 1
*change the omitted value to metro==1, ie rural.
. xi i.metro
i.metro _Imetro_0-4 (naturally coded; _Imetro_1 omitted)
. regress incwage _Imetro* if age>29 & age<65 & sex==1 & metro~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Imetro_0 | (dropped)
_Imetro_2 | 7255.712 668.0533 10.86 0.000 5946.297 8565.127
_Imetro_3 | 16013.39 593.9852 26.96 0.000 14849.15 17177.63
_Imetro_4 | 8368.313 758.7058 11.03 0.000 6881.216 9855.411
_cons | 27189.65 474.1327 57.35 0.000 26260.33 28118.97
------------------------------------------------------------------------------
. *See my excel file for an explanation of why this is the same as the result from our simple table.
. *For instance, for central city:
. display 27189+7255
34444
. *That's our central city average.
. *men in every other metro category make more than men in rural American, T statistics are all significant.
. *What if we want to compare category 4 and category 2?
. lincom _Imetro_4-_Imetro_2
( 1) - _Imetro_2 + _Imetro_4 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
------------------------------------------------------------------------------
. *The difference between category 4 and category 2 is 1112 dollars in income, but it is not a significant difference.
. *one thing you definitely do not want to do, is put the categorical variable straight in the regression:
. regress incwage metro if age>29 & age<65 & sex==1 & metro~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 1, 29239) = 400.78
Model | 6.0432e+11 1 6.0432e+11 Prob > F = 0.0000
Residual | 4.4088e+13 29239 1.5078e+09 R-squared = 0.0135
-------------+------------------------------ Adj R-squared = 0.0135
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38831
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro | 4563.546 227.9541 20.02 0.000 4116.745 5010.346
_cons | 25213.42 605.392 41.65 0.000 24026.83 26400.02
------------------------------------------------------------------------------
. *Please don't do that
. tabulate metro
Metropolitan central city |
status | Freq. Percent Cum.
----------------------------+-----------------------------------
Not identifiable | 340 0.25 0.25
Not in metro area | 29,658 22.18 22.44
Central city | 32,481 24.29 46.73
Outside central city | 51,468 38.49 85.22
Central city status unknown | 19,763 14.78 100.00
----------------------------+-----------------------------------
Total | 133,710 100.00
*Make my own set of dummy variables, with rural as the excluded category
. gen cent_city=0
. replace cent_city=1 if metro==2
(32481 real changes made)
. gen suburb=0
. replace suburb=1 if metro==3
(51468 real changes made)
. gen metro_cent_city_unkown=0
. replace metro_cent_city_unkown=1 if metro==4
(19763 real changes made)
. regress incwage cent_city suburb metro_cent_city_unkown if age>29 & age<65 & sex==1 & metro~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cent_city | 7255.712 668.0533 10.86 0.000 5946.297 8565.127
suburb | 16013.39 593.9852 26.96 0.000 14849.15 17177.63
metro_cent~n | 8368.313 758.7058 11.03 0.000 6881.216 9855.411
_cons | 27189.65 474.1327 57.35 0.000 26260.33 28118.97
------------------------------------------------------------------------------
. *this has some relevance for HW2
. lincom metro_cent_city_unkown- cent_city
( 1) - cent_city + metro_cent_city_unkown = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
------------------------------------------------------------------------------
. *let me change gears a bit, and talk about random subsets
. gen random=runiform()
. summarize random
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
random | 133710 .5010981 .2889151 3.11e-06 .9999956
. histogram random
(bin=51, start=3.108e-06, width=.0196077)
. *OK, so random value really is uniform from zero to 1.
. table sex if age>24 & age<35 & random<.05, contents (mean yrsed sd yrsed freq)
-------------------------------------------------
Sex | mean(yrsed) sd(yrsed) Freq.
----------+--------------------------------------
Male | 13.46273 3.079119 483
Female | 13.47561 3.012099 451
-------------------------------------------------
. ttest yrsed if age>24 & age<35 & random<.05, by(sex)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Male | 483 13.46273 .1401048 3.079119 13.18744 13.73802
Female | 451 13.47561 .1418342 3.012098 13.19687 13.75435
---------+--------------------------------------------------------------------
combined | 934 13.46895 .0996458 3.045317 13.27339 13.66451
---------+--------------------------------------------------------------------
diff | -.0128768 .1995152 -.4044279 .3786742
------------------------------------------------------------------------------
diff = mean(Male) - mean(Female) t = -0.0645
Ho: diff = 0 degrees of freedom = 932
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.4743 Pr(|T| > |t|) = 0.9486 Pr(T > t) = 0.5257
. *This random subset has hardly any difference in yrsed between men and women. The small difference that there is is totally not significant.
. graph hbox yrsed if age>24 & age<35, over(sex)
. *boxplot of yrsed by sex is not informative, particularly, because the two genders have the same boxplot.
. log close
log: C:\AAA Miker Files\newer web pages\soc_meth_proj3\clas4_2009.log
log type: text
closed on: 10 Feb 2009, 12:09:08
---------------------------------------------------------------------------------