Homework 3

 

Draft (updated Aug 4, 2013)

 

Using the March, 2000 CPS we are all familiar with.

 

 

1) We are going to re-analyze the influence of vietnam veteran status on income (sort of a return to HW1, but this time with better tools). Generate a new variable, age_sq=age^2 (i.e. age squared). Use aweights to get the correct unbiased parameters. The aweights should not affect the number of observations reported.

Fill in the following Table with the relevant regression output.

 

Use the following style for filling in the table:

 

regression coefficient 

(standard error)

[T-statistic with Asterisks indicating statistical significance, if appropriate- see note below table]

 

 

So, a coefficient of 3.2 with a std error of 1.5 and a resulting T-statistic of 2.1, yielding a two-tailed P of just under 0.05 would look like this:

-3.2

(1.5)

[2.1*]

 

 

All models are Ordinary Least Square regression models (ie Stata regress) predicting incwage for adults age 25-64. For the Vietnam veteran dummy variable, it might be easiest if you created this by hand (=1 when the subject is a Vietnam veteran, =0 otherwise). For your own control variables, know whether they are continuous (like yrsed) or categorical variables (like race) and treat them accordingly.

 

 

 

Model 1

Model 2

Model 3

Model 4

Model 5

Model 6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

vietnam veteran

dummy var for vietnam veteran status

dummy var for vietnam veteran status

dummy var for vietnam veteran status

dummy var for vietnam veteran status

dummy var for vietnam veteran status

dummy var for vietnam veteran status

sex (specify which gender you are comparing to which)

 

sex

sex

sex

sex

sex

age

 

 

age

age

age

age

age squared

 

 

 

age squared

age squared

age squared

years of education (yrsed)

 

 

 

 

years of education

years of education

Your control variable 1

 

 

 

 

 

1 or 2 other variables that you think are appropriate controls (explain why)

Your control variable 2

 

 

 

 

 

 

 

 

 

 

 

 

 

Constant

 

 

 

 

 

 

 

 

 

 

 

 

 

Unweighted N

 

 

 

 

 

 

R-square

 

 

 

 

 

 

F-stat comparison with previous model (Soc 381)

 

 

 

 

 

 

Adjusted R-square

 

 

 

 

 

 

 

* P< .05

** P< .01

*** P<.001, two tailed tests

 

 

2) Questions:

            a) Was there an advantage in the 1999 labor market to being a vietnam veteran? How sure are we that Vietnam veteran status made a difference in individual income in 1999? Justify your answer by reference to the filled-out table above.

            b) Which are the control variables that seem to make the most difference to the income contrast between Vietnam veterans and others?

            c) Which model fits the best by the adjusted R-square?

            d) How do you interpret the constant in Model 1, and how do you interpret the constant term in the subsequent models? Why is the meaning of the constant more relevant in Models 1 and 2 than in models 3-6 (in other words, why does the constant term correspond to the real income of a relevant subset in models 1 and 2, but not in models 3-6)?

            e) Across these 6 models, which coefficient has the largest T-score in absolute value (ignore the constant term)? How would you interpret the magnitude of this T-statistic?

            f) Why is the age coefficient insignificant in Model 3, but significant in Model 4? For all the models with age and age-squared, determine the age at which predicted income is highest.

            g) How do you interpret the coefficient for Vietnam veteran status in Model 1, and in Model 5?

            h) How do you interpret the coefficient for years of education in Model 5? Compare the Vietnam veteran coefficient in models 4 and 5. What does the difference tell you about the educational distribution of Vietnam veterans compared to non Vietnam veterans?

            i) Explain your choice of one or two additional control variables for Model 6. Explain the coefficients for these variables, and explain their effect (if any) on the coefficient for Vietnam veteran status.

            j) Why do models 2 and 3 have adjusted R-square that is so similar?

            k) Do women get the same income benefit from education that men get? Create a new model to answer this question, starting with model 5, and report the results and explain them. How much of an income benefit do women get for each additional year of education, and how much income benefit do men get for each additional year of education?

 

3) (For Soc 381 only) Use the formula (in my “notes on mean and variance” to generate the F-statistic for the comparison of the R-square for each successive model (i.e. comparing model 2 to model 1, then model 3 to model 2, etc) including the degrees of freedom, the statistic, and P value of this statistic. Report as F (df1, df2)= value, (P=…). Explain the meaning of the F-statistics, what null hypotheses are accepted or rejected.

 

4) (For Soc 381 only) For this problem I want you to re-create Model 5 of problem 1 above, but with a variety of different ways of estimating the model, and with different assumptions about the weights. In this table, enter the weights, the standard errors, and the T-statistics or Z-scores that go with each model. For model 5d, run the regression with the option, vce(bootstrap, reps(100)). For Model 5e, use the stata command glm instead of regress, and use the options, family(gaussian) link(identity).

 

 

Model 5

Model 5b

Model 5c

Model 5d

Model 5e

Notes:

just as in  model 1, above

with pweight instead of aweight

without weights

Without weights, and with bootstrap standard errors

without weights and with Likelihood Maximization rather than OLS

 

 

 

 

 

 

vietnam veteran (dummy var)

 

 

 

 

 

Female (dummy var)

 

 

 

 

 

age

 

 

 

 

 

age squared

 

 

 

 

 

years of education

 

 

 

 

 

 

 

 

 

 

 

Constant

 

 

 

 

 

 

 

 

 

 

 

Unweighted N

 

 

 

 

 

R-square

 

 

 

 

 

 

4a) How do the models above differ, substantively, and in their coefficients and standard errors? Comment on the differences and similarities.

4b) If you run Model 5d a second time, do you get the same standard errors answers? Why?

4c) Model 5e and model 5c coefficients are arrived at in different ways: Model 5c, using the Stata command regress derives its results via OLS, Ordinary Least Squares. Model 5e is fit iteratively using a method of maximum likelihood. Can you guess why Model 5e and Model 5c are so similar?