-----------------------------------------------------------------------------------
name: <unnamed>
log: C:\Documents and Settings\Michael Rosenfeld\My Documents\newer web pag
> es\soc_meth_proj3\2010_logs\fifth class.log
log type: text
opened on: 9 Feb 2010, 14:05:46
. tabulate vetlast
Veteran's most recent |
period of service | Freq. Percent Cum.
---------------------------+-----------------------------------
NIU | 30,904 23.11 23.11
No service | 91,149 68.17 91.28
World War II | 2,428 1.82 93.10
Korean War | 1,716 1.28 94.38
Vietnam Era | 3,683 2.75 97.14
Other service | 3,830 2.86 100.00
---------------------------+-----------------------------------
Total | 133,710 100.00
. table vetlast if sex==1, contents(freq p25 age mean age p75 age)
------------------------------------------------------------------------
Veteran's |
most recent |
period of |
service | Freq. p25(age) mean(age)
--------------+---------------------------------------------------------
NIU | 15,810 4 7.7223911285400391
No service | 37,926 25 38.128620147705078
World War II | 2,339 74 77.200088500976563
Korean War | 1,681 66 67.854255676269531
Vietnam Era | 3,584 49 52.687778472900391
Other service | 3,451 35 45.978267669677734
------------------------------------------------------------------------
----------------------------------
Veteran's |
most recent |
period of |
service | p75(age)
--------------+-------------------
NIU | 11
No service | 48
World War II | 80
Korean War | 70
Vietnam Era | 55
Other service | 60
----------------------------------
. table vetlast if sex==1 & age>65 & age<71, contents(freq mean inctot)
------------------------------------------
Veteran's |
most recent |
period of |
service | Freq. mean(inctot)
--------------+---------------------------
No service | 805 27938.16646
World War II | 38 17378.52632
Korean War | 1,037 32267.98457
Vietnam Era | 47 37251.76596
Other service | 141 42317.05674
------------------------------------------
*One question that students struggled a bit with in HW1 (it was not part of the grade) was whether there were enough vets and non-vets of the same age to make a sensible or statistically sound comparison. Most students said "no," but I want to suggest that the answer is "yes."
. codebook vetlast
---------------------------------------------------------------------------------
vetlast Veteran's most recent period of service
---------------------------------------------------------------------------------
type: numeric (byte)
label: vetlastlbl
range: [0,9] units: 1
unique values: 6 missing .: 0/133710
tabulation: Freq. Numeric Label
30904 0 NIU
91149 1 No service
2428 4 World War II
1716 6 Korean War
3683 8 Vietnam Era
3830 9 Other service
. *I am going to generate a dummy variable that contrasts Korean war vets with non-service persons. That is, I am going to make a home-made dummy variable, which is going to be useful for just the contrast between Korean vets and non vets.
. gen Korean_vet=0 if vetlast==1
(42561 missing values generated)
. replace Korean_vet=1 if vetlast==6
(1716 real changes made)
. tabulate vetlast Korean_vet, missing
Veteran's most recent | Korean_vet
period of service | 0 1 . | Total
----------------------+---------------------------------+----------
NIU | 0 0 30,904 | 30,904
No service | 91,149 0 0 | 91,149
World War II | 0 0 2,428 | 2,428
Korean War | 0 1,716 0 | 1,716
Vietnam Era | 0 0 3,683 | 3,683
Other service | 0 0 3,830 | 3,830
----------------------+---------------------------------+----------
Total | 91,149 1,716 40,845 | 133,710
. display 32267-27938
4329
. *There is an income difference of 4329 between the Korean vets and the same age male non-vets
. regress inctot Korean_vet if sex==1 & age>65 & age<71
Source | SS df MS Number of obs = 1842
-------------+------------------------------ F( 1, 1840) = 8.52
Model | 8.4962e+09 1 8.4962e+09 Prob > F = 0.0035
Residual | 1.8342e+12 1840 996824336 R-squared = 0.0046
-------------+------------------------------ Adj R-squared = 0.0041
Total | 1.8427e+12 1841 1.0009e+09 Root MSE = 31573
------------------------------------------------------------------------------
inctot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Korean_vet | 4329.818 1483.088 2.92 0.004 1421.106 7238.53
_cons | 27938.17 1112.785 25.11 0.000 25755.71 30120.62
------------------------------------------------------------------------------
. *The t-statistic indicates a significant difference between Korean vets and non vets of the same age (see the T-test I perform below which yields exactly the same T statistic).
. * the coefficient of 4329 equals the difference between groups
* The R-square is the proportion of the overall variance in inctot explained by our predictor variables (in this case our predictor is only Korean_vet, which explains 0.0046, or less than 1% of the variance in inctot.
. *stata has several ways to generate dummy variables. One way is the xi command.
. xi i.vetlast
i.vetlast _Ivetlast_0-9 (naturally coded; _Ivetlast_0 omitted)
*by itself, the xi command generates a set of dummy variables, one for every level of the categorical variable, with one category (generally the first category) excluded and used as the comparison group. Each dummy variable is a zero-one contrast, like the dummy variables we made by hand.
. tabulate vetlast _Ivetlast_6
Veteran's most recent | vetlast==6
period of service | 0 1 | Total
----------------------+----------------------+----------
NIU | 30,904 0 | 30,904
No service | 91,149 0 | 91,149
World War II | 2,428 0 | 2,428
Korean War | 0 1,716 | 1,716
Vietnam Era | 3,683 0 | 3,683
Other service | 3,830 0 | 3,830
----------------------+----------------------+----------
Total | 131,994 1,716 | 133,710
*More usually, we will use xi and regress together, with the i.vetlast telling stata to use xi to generate dummy variables for each level of vetlast.
. xi: regress inctot i.vetlast if sex==1 & age>65 & age<71
i.vetlast _Ivetlast_0-9 (naturally coded; _Ivetlast_0 omitted)
note: _Ivetlast_8 omitted because of collinearity
Source | SS df MS Number of obs = 2068
-------------+------------------------------ F( 4, 2063) = 8.34
Model | 3.6137e+10 4 9.0341e+09 Prob > F = 0.0000
Residual | 2.2337e+12 2063 1.0828e+09 R-squared = 0.0159
-------------+------------------------------ Adj R-squared = 0.0140
Total | 2.2699e+12 2067 1.0982e+09 Root MSE = 32905
------------------------------------------------------------------------------
inctot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Ivetlast_1 | -9313.599 4937.879 -1.89 0.059 -18997.35 370.1461
_Ivetlast_4 | -19873.24 7178.542 -2.77 0.006 -33951.18 -5795.297
_Ivetlast_6 | -4983.781 4907.314 -1.02 0.310 -14607.59 4640.023
_Ivetlast_8 | (omitted)
_Ivetlast_9 | 5065.291 5542.273 0.91 0.361 -5803.742 15934.32
_cons | 37251.77 4799.749 7.76 0.000 27838.91 46664.62
------------------------------------------------------------------------------
*After running the regression, take a look at your variable window- a bunch of new variables appear starting with _I
*The contrast we really want is between _Ivetlast_6 (value 6 corresponds to Korean war vets) and _Ivetlast1 (corresponding with non-vets). We can recover that comparison with lincom, giving us the linear combination that we want. One important thing to remember is that the omitted category is arbitrary, and often the omitted category won't be the one we really want to compare to. In this case the omitted category is vetlast=0, or the NIU respondents. So we create the contrast we want with lincom.
. lincom _Ivetlast_6- _Ivetlast_1
( 1) - _Ivetlast_1 + _Ivetlast_6 = 0
------------------------------------------------------------------------------
inctot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 4329.818 1545.699 2.80 0.005 1298.525 7361.111
------------------------------------------------------------------------------
* Here we recover the Korean vet versus non vet comparison, with the same mean but the T statistic is slightly different because the presence of all the other folks (vetlast==2, ==3, etc) shape the common variance and effect the t statistic.
. char vetlast [omit] 1
*the above command tells stata to use vetlast=1, i.e. non-vets as the default omitted value, which makes more sense for us.
. xi: regress inctot i.vetlast if sex==1 & age>65 & age<71
i.vetlast _Ivetlast_0-9 (naturally coded; _Ivetlast_1 omitted)
note: _Ivetlast_0 omitted because of collinearity
Source | SS df MS Number of obs = 2068
-------------+------------------------------ F( 4, 2063) = 8.34
Model | 3.6137e+10 4 9.0341e+09 Prob > F = 0.0000
Residual | 2.2337e+12 2063 1.0828e+09 R-squared = 0.0159
-------------+------------------------------ Adj R-squared = 0.0140
Total | 2.2699e+12 2067 1.0982e+09 Root MSE = 32905
------------------------------------------------------------------------------
inctot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Ivetlast_0 | (omitted)
_Ivetlast_4 | -10559.64 5462.501 -1.93 0.053 -21272.23 152.9501
_Ivetlast_6 | 4329.818 1545.699 2.80 0.005 1298.525 7361.111
_Ivetlast_8 | 9313.599 4937.879 1.89 0.059 -370.1461 18997.35
_Ivetlast_9 | 14378.89 3004.039 4.79 0.000 8487.626 20270.15
_cons | 27938.17 1159.764 24.09 0.000 25663.74 30212.6
------------------------------------------------------------------------------
. *the char command set the default omitted value for vetlast to 1, which is the no service value, which is what we wanted.
* If we go back to looking only at two groups, Korean vet versus non vets of the same age (remember this particular dummy variable has missing values for all the other levels of vetlast), we get the same coefficient and t-statistic as our first regression above.
. ttest inctot if sex==1 & age>65 & age<71, by( Korean_vet)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 805 27938.17 1229.848 34893.89 25524.07 30352.26
1 | 1037 32267.98 892.2152 28731.55 30517.23 34018.74
---------+--------------------------------------------------------------------
combined | 1842 30375.75 737.1402 31636.97 28930.03 31821.46
---------+--------------------------------------------------------------------
diff | -4329.818 1483.088 -7238.53 -1421.106
------------------------------------------------------------------------------
diff = mean(0) - mean(1) t = -2.9195
Ho: diff = 0 degrees of freedom = 1840
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0018 Pr(|T| > |t|) = 0.0035 Pr(T > t) = 0.9982
. save "C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta", replace
file C:\Documents and Settings\Michael Rosenfeld\Desktop\cps_mar_2000_new.dta saved
. clear
. exit, clear
* And now a few comments that relate to what I said at the end of class, but which didn't make into the class log.
* In order to use xi sensibly with our 3 occupations for HW2, while excluding all the other occupations and also generating output that is readable, here is what I suggest.
First, create a new variable that has 3 categories (for nurses, lawyers, and sociologists, missing for all other occupations).
. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125
Occupation, 1990 basis | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Registered nurses | 966 68.37 68.37
Sociology instructors | 6 0.42 68.79
Lawyers | 441 31.21 100.00
----------------------------------------+-----------------------------------
Total | 1,413 100.00
. tabulate occ1990 if occ1990==178|occ1990==95|occ1990==125, nolab
Occupation, |
1990 basis | Freq. Percent Cum.
------------+-----------------------------------
95 | 966 68.37 68.37
125 | 6 0.42 68.79
178 | 441 31.21 100.00
------------+-----------------------------------
Total | 1,413 100.00
. gen hw2_occ=1 if occ1990==95
(132744 missing values generated)
. replace hw2_occ=2 if occ1990==125
(6 real changes made)
. replace hw2_occ=3 if occ1990==178
(441 real changes made)
. label define hw2_occ 1 "nurses" 2 "sociologists" 3 "lawyers"
. label val hw2_occ hw2_occ
tabulate occ1990 hw2_occ
Occupation, 1990 | hw2_occ
basis | nurses sociologi lawyers | Total
----------------------+---------------------------------+----------
Registered nurses | 966 0 0 | 966
Sociology instructors | 0 6 0 | 6
Lawyers | 0 0 441 | 441
----------------------+---------------------------------+----------
Total | 966 6 441 | 1,413
*now we are ready to use xi and regress on hw2_occ…
xi: regress inctot i.hw2_occ
i.hw2_occ _Ihw2_occ_1-3 (naturally coded; _Ihw2_occ_1 omitted)
Source | SS df MS Number of obs = 1413
-------------+------------------------------ F( 2, 1410) = 262.68
Model | 1.0359e+12 2 5.1795e+11 Prob > F = 0.0000
Residual | 2.7802e+12 1410 1.9718e+09 R-squared = 0.2715
-------------+------------------------------ Adj R-squared = 0.2704
Total | 3.8161e+12 1412 2.7026e+09 Root MSE = 44405
------------------------------------------------------------------------------
inctot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Ihw2_occ_2 | 3576.166 18184.46 0.20 0.844 -32095.35 39247.68
_Ihw2_occ_3 | 58455.42 2551.942 22.91 0.000 53449.4 63461.43
_cons | 40787.17 1428.706 28.55 0.000 37984.55 43589.79
------------------------------------------------------------------------------
*Here nurses are the omitted, comparison category, and sociologists and lawyers are compared to them (in the HW you will use incwage)