HW 2, Soc 388

Due Thursday, October 18, in class

Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due.  If you're stuck, email me.  If you still can't figure it out, just do the best you can and don't panic.

Reading: Hout Chapters 2-4; Agresti Ch 6.

Note:  What I refer to as the independence model, Hout refers to as the model of 'perfect mobility', and my Model 4, the Independence model plus one term for each diagonal cell, is Hout's Quasi- Perfect Mobility, or QPM.  Model 3 with one term for all diagonal cells is what Hout refers to on P. 30 as QPM-C.

Also note that some of the characteristics of the models that Hout describes, especially the parts about coefficients summing to zero, are not characteristics of the models, but rather characteristics of the way in which some programs construct dummy variables.  Stata's built in xi function always constructs dummy variables with one excluded category (equal to zero) in each variable.  The user-written Stata function desmat can construct dummy variables in any number of ways, including the way hout describes (this is the dev option), and by default the same way xi does it (this is the ind option).  You don't have to play with the dummy variable options unless you want to- I just thought you'd like to know that that you could.

BIC (since it isn't defined in either text):  BIC= LRT- df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual degrees of freedom, and N is the sample size from the whole dataset.  The syllabus contains references that define BIC (Raftery 1986) and critique it (Weakliem 1999).  Lower BIC indicates better fit, and BIC < 0 indicates a model that is preferred to the saturated model.

ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity

50(abs(predicted/N)-(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.

Important ideas:  Goodness of fit measures, hypothesis testing.

Consider the Los Angeles intermarriage dataset:

Intermarriage, LA 1990

 Wives Husbands: NH Black Mexican Other Hisp All Others NH White Non Hisp Black 4074 63 32 42 215 Mexican 25 3947 143 95 1009 Other Hispanic 16 132 239 18 304 All Others 19 78 18 1022 360 Non Hisp White 103 1156 373 492 28453

Fill in the following Table

 Model # Model Description Terms in model Residual df Goodness of fit Chi-square Goodness of fit Chi-square P BIC ID Notes 1 Constant only 2 Independence Model 3 Independence plus single level of endogamy (same for all groups) 4 Independence plus separate endogamy term for each group 5 Same as 4, plus Black- White and Mexican- Other Hispanic interactions 6 Your best fitting model here

1) Fill in the above table, models 1-5, leave the 'notes' column blank for now.  For model 5 the Black- White and Mexican- Other Hispanic terms are gender symmetric.

2) Verify that model 1, the 'constant' model is the comparison model for the likelihood ratio chi-square that Stata lists as the second line of output for each subsequent model.  How do you interpret that chi-square test?

3) Does racial endogamy vary significantly between groups?  What is the statistical test that answers that question?

4) In model 4 which is the group with the strongest ethnic or racial endogamy?  Which group has the weakest endogamy?  Is the difference between the strongest and weakest statistically significant?

5) Generate the predicted values for Model 5.  Where do the predicted values and the actual values correspond exactly?

6) How do you interpret the coefficients for Black- White and Mexican- Other Hispanic intermarriage in Model 5?

7) If you add a gender specific dimension to Black- White intermarriage in Model 5, is it significant?

8) Make a new model 7, which consists of Model 2, the independence model, plus the gender symmetric Black- White interaction term.  Compare the resulting Black- White interaction term to same term from Model 5.  Why is it different?  Think about how the comparison group is different in the two cases.

9) Of models 1-5, which is the best fitting by BIC?  Which fits best by the goodness of fit chi-square? Which fits best by Index of Dissimilarity?

10) Find a model that fits better than model 5 by either BIC or the goodness of fit chi-square.