--------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\fall_2011_381_logs\class5.log
log type: text
opened on: 11 Oct 2011, 13:02:26
* first of all, when I later get to talking about methods for creating dummy variables, one useful free add-in to get is desmat.
. ssc install desmat,
replace
checking desmat consistency
and verifying not already installed...
the following files will be replaced:
c:\ado\plus\d\desmat.ado
c:\ado\plus\d\desmat.hlp
c:\ado\plus\d\desrep.ado
c:\ado\plus\d\desrep.hlp
c:\ado\plus\d\destest.ado
c:\ado\plus\d\destest.hlp
c:\ado\plus\o\outshee2.ado
c:\ado\plus\o\outshee2.hlp
c:\ado\plus\s\showtrms.ado
c:\ado\plus\s\showtrms.hlp
installing into c:\ado\plus\...
installation complete.
. which desmat
c:\ado\plus\d\desmat.ado
*! version 3.2, 17Sep2004, John_Hendrickx@yahoo.com
. use "C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pages\soc_meth_proj3\cps_mar_2000_new.dta", clear
* take a look at the Excel page for “understanding dummy variables”
. table metro if age >=30 & age <=64 & sex==1, contents (freq mean incwage)
----------------------------------------------------------
Metropolitan central city |
status | Freq. mean(incwage)
----------------------------+-----------------------------
Not identifiable | 94 31743.04255
Not in metro area | 6,628 27189.6465
Central city | 6,727 34445.35841
Outside central city | 11,639 43203.0348
Central city status unknown | 4,247 35557.95997
----------------------------------------------------------
. * update all
* the “update all” command is something you ought to do on your Stata installation, every once in a while.
. codebook metro
--------------------------------------------------------------------------------
metro Metropolitan central city status
--------------------------------------------------------------------------------
type: numeric (byte)
label: metrolbl
range: [0,4] units: 1
unique values: 5 missing .: 0/133710
tabulation: Freq. Numeric Label
340 0 Not identifiable
29658 1 Not in metro area
32481 2 Central city
51468 3 Outside central city
19763 4 Central city status unknown
. regress incwage metro if age >=30 & age<=64 & sex==1 & metro~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 1, 29239) = 400.78
Model | 6.0432e+11 1 6.0432e+11 Prob > F = 0.0000
Residual | 4.4088e+13 29239 1.5078e+09 R-squared = 0.0135
-------------+------------------------------ Adj R-squared = 0.0135
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38831
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro | 4563.546 227.9541 20.02 0.000 4116.745 5010.346
_cons | 25213.42 605.392 41.65 0.000 24026.83 26400.02
------------------------------------------------------------------------------
* Please don’t do this! Given that metro is a categorical variable with nominal (rather than ordinal) categories, and given that regress assumes that predictor variables are continuous unless you tell it otherwise, this regression command is a crime against proper data analysis, using a categorical variable as a continuous one. In order to treat metro properly as a categorical variable, we need STATA to generate proper dummy variables for us. One way is to use STATA’s built-in i.variable syntax, which is what STATA calls factor variable syntax, and can be looked up on help for STATA versions 11 and up, under fvvarlist.
. regress incwage i.metro if age>=30 & age <=64 & sex==1 & metro ~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro |
2 | 7255.712 668.0533 10.86 0.000 5946.297 8565.127
3 | 16013.39 593.9852 26.96 0.000 14849.15 17177.63
4 | 8368.313 758.7058 11.03 0.000 6881.216 9855.411
|
_cons | 27189.65 474.1327 57.35 0.000 26260.33 28118.97
------------------------------------------------------------------------------
* STATA’s built-in factor variable syntax has some advantages, like it is easy to set the excluded category value, ib#. On the other hand, the built-in factor syntax does not produce dummy variables in your variable list, which are sometimes handy to have.
* Note that, and this is important, the excluded category is arbitrary, and the model fit is the same (same R-square, same F-test), but the output just looks different because the comparison category is changed.
. regress incwage ib2.metro if age>=30 & age <=64 & sex==1 & metro ~=0
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
metro |
1 | -7255.712 668.0533 -10.86 0.000 -8565.127 -5946.297
3 | 8757.676 591.1938 14.81 0.000 7598.91 9916.443
4 | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
|
_cons | 34445.36 470.6309 73.19 0.000 33522.9 35367.82
------------------------------------------------------------------------------
* The older STATA syntax for creating dummy variables is xi (this works on all versions through STATA 11, not sure about STATA 12). xi can be stand-alone or as the prefix for a regression command.
. xi i.metro
i.metro _Imetro_0-4 (naturally coded; _Imetro_2 omitted)
* With the xi syntax, you need a separate command to specify which category value is going to be the excluded, or omitted category.
. char metro[omit] 0
. xi i.metro
i.metro _Imetro_0-4 (naturally coded; _Imetro_0 omitted)
. table metro, contents(mean _Imetro_1 mean _Imetro_2 mean _Imetro_3 mean _I metro_4)
--------------------------------------------------------------------------------
Metropolitan central city |
status | __000002 __000003 __000004 __000005
----------------------------+---------------------------------------------------
Not identifiable | 0 0 0 0
Not in metro area | 1 0 0 0
Central city | 0 1 0 0
Outside central city | 0 0 1 0
Central city status unknown | 0 0 0 1
--------------------------------------------------------------------------------
* What the dummy variables actually look like.
. char metro[omit] 1
* Change the excluded category, then run the regression again with xi creating the dummies.
. xi: regress incwage i.metro if age>=30 & age <=64 & sex==1 & metro ~=0
i.metro _Imetro_0-4 (naturally coded; _Imetro_1 omitted)
note: _Imetro_0 omitted because of collinearity
Source | SS df MS Number of obs = 29241
-------------+------------------------------ F( 3, 29237) = 252.70
Model | 1.1296e+12 3 3.7652e+11 Prob > F = 0.0000
Residual | 4.3563e+13 29237 1.4900e+09 R-squared = 0.0253
-------------+------------------------------ Adj R-squared = 0.0252
Total | 4.4692e+13 29240 1.5285e+09 Root MSE = 38600
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Imetro_0 | (omitted)
_Imetro_2 | 7255.712 668.0533 10.86 0.000 5946.297 8565.127
_Imetro_3 | 16013.39 593.9852 26.96 0.000 14849.15 17177.63
_Imetro_4 | 8368.313 758.7058 11.03 0.000 6881.216 9855.411
_cons | 27189.65 474.1327 57.35 0.000 26260.33 28118.97
------------------------------------------------------------------------------
* When we have the dummy variables on hand, as we do in STATA 11, we can test other alternative contrasts in addition to the contrasts against the excluded category, which is what one gets from the regression output.
. lincom _Imetro_4- _Imetro_2
( 1) - _Imetro_2 + _Imetro_4 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
------------------------------------------------------------------------------
* Now using desmat. Note that when you use the desmat prefix, STATA assumes that all predictor variables will be categorical unless you use the prefix “@”.
. desmat: regress incwage metro=ind(2) if age>=30 & age <=64 & sex==1 & metro ~=0
--------------------------------------------------------------------------------
Linear regression
--------------------------------------------------------------------------------
Dependent variable incwage
Number of observations: 29241
F statistic: 252.703
Model degrees of freedom: 3
Residual degrees of freedom: 29237
R-squared: 0.025
Adjusted R-squared: 0.025
Root MSE 38600.339
Prob: 0.000
--------------------------------------------------------------------------------
nr Effect Coeff s.e.
--------------------------------------------------------------------------------
metro
1 Not identifiable 0.000 .
2 Central city 7255.712** 668.053
3 Outside central city 16013.388** 593.985
4 Central city status unknown 8368.313** 758.706
5 _cons 27189.646** 474.133
--------------------------------------------------------------------------------
* p < .05
** p < .01
. lincom _x_4-_x_2
( 1) - _x_2 + _x_4 = 0
------------------------------------------------------------------------------
incwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 1112.602 756.5223 1.47 0.141 -370.2164 2595.419
------------------------------------------------------------------------------
. log close
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web
> pages\soc_meth_proj3\fall_2011_381_logs\class5.log
log type: text
closed on: 11 Oct 2011, 15:34:29
--------------------------------------------------------------------------------