Soc 382, HW1
For reference: an earlier OLS regression predicting income for Vietnam Veterans and others.
All models are Ordinary Least Square regression models (ie Stata regress) predicting incwage for adults age 2564.

Model 1 
Model 2 
Model 3 
Model 4 
Model 5 







predictors: 




vietnamveteran (dummy var) 
12,634 (557) [22.7***] 
4,216 (554) [7.61***] 
4,417 (567) [7.78***] 
3,656 (563) [6.49***] 
1,035 (532) [1.94] 
Female (dummy var) 

16,465 (244) [67.4***] 
16,441 (244) [67.2***] 
16,436 (242) [67.89***] 
16,607 (229) [72.54***] 
age 


18.6 (11.5) [1.62] 
3178 (92.4) [34.4***] 
2848 (87.3) [32.61***] 
age squared 



36.58 (1.05) [34.8***] 
31.9 (.992) [32.2***] 
years of education 




3540 (38.5) [92.0***] 
Subject is US born (dummy var) 





Weeks worked last year 











Constant 
26,818*** 
35,663*** 
36,435*** 
29,197*** 
71,687*** 






Unweighted N 
69,305 
69,305 
69,305 
69,305 
69,305 
Adjusted Rsquare 
0.0073 
0.068 
0.068 
0.084 
0.184 
* P< .05
** P< .01
*** P.001, two tailed tests
1) (for Soc 381) Propensity Score Matching. In the OLS regressions from Soc 381 HW3 above, we have been using regression to control for differences between Vietnam veterans and other workers. Propensity score matching takes a different approach. In propensity score matching, the regression is used to find the other survey subjects who are most like the Vietnam veterans on a series of control variables (age, gender, education, etc), and then the Vietnam veterans’ income is compared to the most similar others’ incomes directly.
1a) How many Vietnam veterans and how many other subjects were there in the March, 2000 CPS, with age>=25 and age<=64?
1b) Use the user addin function psmatch2, which you will have to add by using the following command:
ssc install psmatch2, replace
Note that unlike the regressions above in problem 1, which are solved instantly, the propensity score matching takes a while, so you may have to be patient waiting for Stata to produce results. Also note that we are doing this analysis without weights. And see Treiman p. 392393 for a brief discussion of Propensity Score Matching.
psmatch2 vietnam_vet female age age_sq yrsed if age>=25 & age<=64, logit out(incwage) ties
The above syntax is for our model 5 equivalent, the rightmost model in the table below. The controls are female, age, age_sq, and yrsed. The logit specification means that the regression to generate the propensity scores (i.e. the propensity to be just like the Vietnam veterans) is a logit model, or logistic regression. The “ties” option means that Stata will use all the matches that have the same propensity score, meaning all the matches that have the same controls, so here we don’t have to worry about sorting. After entering the results for Model 5 below, work your way left by dropping yrsed (Model 4), then dropping age_sq (Model 3). Then go back to model 5 and repeat it with the adjusted age filter of 4064 years.
Analog to which regression model 
Model 5, adjusted age filter 
Model 3 
Model 4 
Model 5 
Controls 
female age age_sq yrsed 
female age 
female age age_sq 
female age age_sq yrsed 
Age restriction 
4064 
2564 
2564 
2564 
Vietnam veteran coefficient (SE) and [Z score] or [T score] 




1c) How do the Vietnam veteran coefficients in Models 35 compare to the Vietnam veteran coefficients in models 35 in the regression results in problem 1, above? Why are the results different?
1d) Note that the results above for Models 3 and 4 are the same, whereas the regression results above in problem 1 for Models 3 and 4 are different? Why do you think the propensity score result for the effect of Vietnam veteran status on income is the same for Models 3 and 4?
1e) And why do you think the propensity score results for Model 5 (age 2564) and Model 5 with the adjusted age filter (age 4064) result in the same coefficient for Vietnam veteran’s income?