HW1, Propensity score

Soc 382, HW1

For reference: an earlier OLS regression predicting income for Vietnam Veterans and others.

All models are Ordinary Least Square regression models (ie Stata regress) predicting incwage for adults age 25-64.

	Model 1	Model 2	Model 3	Model 4	Model 5

	predictors:
vietnamveteran (dummy var)	12,634 (557) [22.7***]	4,216 (554) [7.61***]	4,417 (567) [7.78***]	3,656 (563) [6.49***]	1,035 (532) [1.94]
Female (dummy var)		-16,465 (244) [-67.4***]	-16,441 (244) [-67.2***]	-16,436 (242) [-67.89***]	-16,607 (229) [-72.54***]
age			-18.6 (11.5) [-1.62]	3178 (92.4) [34.4***]	2848 (87.3) [32.61***]
age squared				-36.58 (1.05) [-34.8***]	-31.9 (.992) [-32.2***]
years of education					3540 (38.5) [92.0***]
Subject is US born (dummy var)
Weeks worked last year

Constant	26,818***	35,663***	36,435***	-29,197***	-71,687***

Unweighted N	69,305	69,305	69,305	69,305	69,305
Adjusted R-square	0.0073	0.068	0.068	0.084	0.184

* P< .05

** P< .01

*** P.001, two tailed tests

1) (for Soc 381) Propensity Score Matching. In the OLS regressions from Soc 381 HW3 above, we have been using regression to control for differences between Vietnam veterans and other workers. Propensity score matching takes a different approach. In propensity score matching, the regression is used to find the other survey subjects who are most like the Vietnam veterans on a series of control variables (age, gender, education, etc), and then the Vietnam veterans’ income is compared to the most similar others’ incomes directly.

1a) How many Vietnam veterans and how many other subjects were there in the March, 2000 CPS, with age>=25 and age<=64?

1b) Use the user add-in function psmatch2, which you will have to add by using the following command:

ssc install psmatch2, replace

Note that unlike the regressions above in problem 1, which are solved instantly, the propensity score matching takes a while, so you may have to be patient waiting for Stata to produce results. Also note that we are doing this analysis without weights. And see Treiman p. 392-393 for a brief discussion of Propensity Score Matching.

psmatch2 vietnam_vet female age age_sq yrsed if age>=25 & age<=64, logit out(incwage) ties

The above syntax is for our model 5 equivalent, the right-most model in the table below. The controls are female, age, age_sq, and yrsed. The logit specification means that the regression to generate the propensity scores (i.e. the propensity to be just like the Vietnam veterans) is a logit model, or logistic regression. The “ties” option means that Stata will use all the matches that have the same propensity score, meaning all the matches that have the same controls, so here we don’t have to worry about sorting. After entering the results for Model 5 below, work your way left by dropping yrsed (Model 4), then dropping age_sq (Model 3). Then go back to model 5 and repeat it with the adjusted age filter of 40-64 years.

Analog to which regression model	Model 5, adjusted age filter	Model 3	Model 4	Model 5
Controls	female age age_sq yrsed	female age	female age age_sq	female age age_sq yrsed
Age restriction	40-64	25-64	25-64	25-64
Vietnam veteran coefficient (SE) and [Z score] or [T score]

1c) How do the Vietnam veteran coefficients in Models 3-5 compare to the Vietnam veteran coefficients in models 3-5 in the regression results in problem 1, above? Why are the results different?

1d) Note that the results above for Models 3 and 4 are the same, whereas the regression results above in problem 1 for Models 3 and 4 are different? Why do you think the propensity score result for the effect of Vietnam veteran status on income is the same for Models 3 and 4?

1e) And why do you think the propensity score results for Model 5 (age 25-64) and Model 5 with the adjusted age filter (age 40-64) result in the same coefficient for Vietnam veteran’s income?