Homework 3
Draft (updated Oc 26, 2015)
Using the March, 2000 CPS we are all familiar with.
1) We are going to reanalyze the influence of vietnam veteran status on income (sort of a return to HW1, but this time with better tools). Generate a new variable, age_sq=age^2 (i.e. age squared). Use aweights to get the correct unbiased parameters. The aweights should not affect the number of observations reported.
Fill in the following Table with the relevant regression output.
Use the following style for filling in the table:
regression coefficient
(standard error)
[Tstatistic with Asterisks indicating statistical significance, if appropriate see note below table]
So, a coefficient of 3.2 with a std error of 1.5 and a resulting Tstatistic of 2.1, yielding a twotailed P of just under 0.05 would look like this:
3.2
(1.5)
[2.1*]
All models are Ordinary Least Square regression models (ie Stata regress) predicting incwage for adults age 2564 (inclusive). For the Vietnam veteran dummy variable, it might be easiest if you created this by hand (=1 when the subject is a Vietnam veteran, =0 otherwise). For your own control variables, know whether they are continuous (like yrsed) or categorical variables (like race) and treat them accordingly.

Model 1 
Model 2 
Model 3 
Model 4 
Model 5 
Model 6 














vietnam veteran 
dummy var for vietnam veteran status 
dummy var for vietnam veteran status 
dummy var for vietnam veteran status 
dummy var for vietnam veteran status 
dummy var for vietnam veteran status 
dummy var for vietnam veteran status 
sex (specify which gender you are comparing to which) 

sex 
sex 
sex 
sex 
sex 
age 


age 
age 
age 
age 
age squared 



age squared 
age squared 
age squared 
years of education (yrsed) 




years of education 
years of education 
Your control variable 1 





1 or 2 other variables that you think are appropriate controls (explain why) 
Your control variable 2 













Constant 













Unweighted N 






Rsquare 






Fstat comparison with previous model (Soc 381) 






Adjusted Rsquare 






* P< .05
** P< .01
*** P<.001, two tailed tests
2) Questions:
a) Was there an advantage in the 1999 labor market to being a vietnam veteran? How sure are we that Vietnam veteran status made a difference in individual income in 1999? Justify your answer by reference to the filledout table above.
b) Which are the control variables that seem to make the most difference to the income contrast between Vietnam veterans and others?
c) Which model fits the best by the adjusted Rsquare?
d) How do you interpret the constant in Model 1, and how do you interpret the constant term in the subsequent models? Why is the meaning of the constant more relevant in Models 1 and 2 than in models 36 (in other words, why does the constant term correspond to the real income of a relevant subset in models 1 and 2, but not in models 36)?
e) Across these 6 models, which coefficient has the largest Tscore in absolute value (ignore the constant term)? How would you interpret the magnitude of this Tstatistic?
f) Why is the age coefficient insignificant in Model 3, but significant in Model 4? For all the models with age and agesquared, determine the age at which predicted income is highest.
g) How do you interpret the coefficient for Vietnam veteran status in Model 1, and in Model 5?
h) How do you interpret the coefficient for years of education in Model 5? Compare the Vietnam veteran coefficient in models 4 and 5. What does the difference tell you about the educational distribution of Vietnam veterans compared to non Vietnam veterans?
i) Explain your choice of one or two additional control variables for Model 6. Explain the coefficients for these variables, and explain their effect (if any) on the coefficient for Vietnam veteran status.
j) Why do models 2 and 3 have adjusted Rsquare that is so similar?
k) Do women get the same income benefit from education that men get? Create a new model to answer this question, building on model 5 (i.e. adding something to model 5), and report the results and explain them. How much of an income benefit do women get for each additional year of education, and how much income benefit do men get for each additional year of education?
3) (For Soc 381 only) Use the formula (in my “notes on mean and variance” to generate the Fstatistic for the comparison of the Rsquare for each successive model (i.e. comparing model 2 to model 1, then model 3 to model 2, etc) including the degrees of freedom, the statistic, and P value of this statistic. Report as F_{ }(df1, df2)= value, (P=…). Explain the meaning of the Fstatistics, what null hypotheses are accepted or rejected.
4) (For Soc 381 only) For this problem I want you to recreate Model 5 of problem 1 above, but with a variety of different ways of estimating the model, and with different assumptions about the weights. In this table, enter the coefficients, the standard errors, and the Tstatistics or Zscores that go with each model. For model 5d, run the regression with the option, vce(bootstrap, reps(100)). For Model 5e, use the stata command glm instead of regress, and use the options, family(gaussian) link(identity).

Model 5 
Model 5b 
Model 5c 
Model 5d 
Model 5e 
Notes: 
just as in Q1 Model 5, above 
with pweight instead of aweight 
without weights 
Without weights, and with bootstrap standard errors 
without weights and with Likelihood Maximization rather than OLS 






vietnam veteran (dummy var) 





Female (dummy var) 





age 





age squared 





years of education 











Constant 











Unweighted N 





Rsquare 





4a) How do the models above differ, substantively, and in their coefficients and standard errors? Comment on the differences and similarities.
4b) If you run Model 5d a second time, do you get the same standard errors answers? Why?
4c) Model 5e and model 5c coefficients are arrived at in different ways: Model 5c, using the Stata command regress derives its results via OLS, Ordinary Least Squares. Model 5e is fit iteratively using a method of maximum likelihood. Can you guess why Model 5e and Model 5c are so similar?
4d) Generate a new model predicting incwage, with educrec (categorical) as the only predictor variable, for subjects age 2564, without weights. Use this syntax if you want to make sure that you don’t get a harmless phantom column in your variancecovariance matrix for the missing value of 0 for educrec:
replace educrec=. if educrec==0
xi: regress incwage i.educrec if age >= 25 & age <=64
In postestimation analysis, test (using lincom) whether the difference in income between people with 11^{th} grade and people with 10^{th} grade educations is statistically significant; show your results and explain them. Then, still in postestimation mode (meaning the regression from 4d is the last model you have run), create a matrix variable equal to the variancecovariance matrix of the estimators, using syntax like
matrix my_matrix=e(V). Then list this new matrix (and include the matrix in your homework), using syntax
matrix list my_matrix.
Use the variancecovariance matrix to generate the standard error of the comparison between 11^{th} and 10^{th} grade subjects. Then compare this to the comparison you derived from lincom.
5) (This question about Propensity Score Matching has been moved to Soc 382, so Soc 381 Fall 2018 can ignore all the parts of Q5) Propensity Score Matching. In question 1 above, and throughout this homework, we have been using regression to control for differences between Vietnam veterans and other workers. Propensity score matching takes a different approach. In propensity score matching, the regression is used to find the other survey subjects who are most like the Vietnam veterans on a series of control variables (age, gender, education, etc), and then the Vietnam veterans’ income is compared to the most similar others’ incomes directly.
5a) How many Vietnam veterans and how many other subjects were there in the March, 2000 CPS, with age>=25 and age<=64?
5b) Use the user addin function psmatch2, which you will have to add by using the following command:
ssc install psmatch2, replace
Note that unlike the regressions above in problem 1, which are solved instantly, the propensity score matching takes a while, so you may have to be patient waiting for Stata to produce results. Also note that we are doing this analysis without weights. And see Treiman p. 392393 for a brief discussion of Propensity Score Matching.
psmatch2 vietnam_vet female age age_sq yrsed if age>=25 & age<=64, logit out(incwage) ties
The above syntax is for our model 5 equivalent, the rightmost model in the table below. The controls are female, age, age_sq, and yrsed. The logit specification means that the regression to generate the propensity scores (i.e. the propensity to be just like the Vietnam veterans) is a logit model, or logistic regression. The “ties” option means that Stata will use all the matches that have the same propensity score, meaning all the matches that have the same controls, so here we don’t have to worry about sorting. After entering the results for Model 5 below, work your way left by dropping yrsed (Model 4), then dropping age_sq (Model 3). Then go back to model 5 and repeat it with the adjusted age filter of 4064 years.
Analog to which regression model 
Model 5, adjusted age filter 
Model 3 
Model 4 
Model 5 
Controls 
female age age_sq yrsed 
female age 
female age age_sq 
female age age_sq yrsed 
Age restriction 
4064 
2564 
2564 
2564 
Vietnam veteran coefficient (SE) and [Z score] or [T score] 




5c) How do the Vietnam veteran coefficients in Models 35 compare to the Vietnam veteran coefficients in models 35 in the regression results in problem 1, above? Why are the results different?
5d) Note that the results above for Models 3 and 4 are the same, whereas the regression results above in problem 1 for Models 3 and 4 are different? Why do you think the propensity score result for the effect of Vietnam veteran status on income is the same for Models 3 and 4?
5e) And why do you think the propensity score results for Model 5 (age 2564) and Model 5 with the adjusted age filter (age 4064) result in the same coefficient for Vietnam veteran’s income?