HW 3, Soc 382
Rev: 2/12/2019
For reference: Treiman Chapter 12. See also, for reference, Hout Chapters 24; Agresti sections on loglinear models for two way contingency tables
Note: What I refer to as the independence model, Hout refers to as the model of 'perfect mobility', and my Model 4, the Independence model plus one term for each diagonal cell, is Hout's Quasi Perfect Mobility, or QPM, and Treiman’s Quasiindependence model. Model 3 with one term for all diagonal cells is what Hout refers to on P. 30 as QPMC.
You will find the LEM 1.0 software at https://jeroenvermunt.nl/
BIC (since it isn't defined in either text): BIC= LRT df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual degrees of freedom, and N is the sample size from the whole dataset. The syllabus contains references that define BIC (Raftery 1986) and critique it (Weakliem 1999). Lower BIC indicates better fit, and BIC < 0 indicates a model that is preferred to the saturated model.
ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity
50(abs(predicted/N)(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.
Important ideas: Goodness of fit measures, hypothesis testing.
Consider the Los Angeles intermarriage dataset:
Intermarriage, LA 1990

Wives 




Husbands: 
NH Black 
Mexican 
Other Hisp 
All Others 
NH White 
Non Hisp Black 
4074 
63 
32 
42 
215 
Mexican 
25 
3947 
143 
95 
1009 
Other Hispanic 
16 
132 
239 
18 
304 
All Others 
19 
78 
18 
1022 
360 
Non Hisp White 
103 
1156 
373 
492 
28453 
Fill in the following Table
Model # 
Model Description 
Terms in model 
Residual df 
Goodness of fit Chisquare 
Goodness of fit Chisquare P 
BIC 
ID 
Notes 
1 
Constant only 







2 
Independence Model 







3 
Independence plus single level of endogamy (same for all groups) 







4 
Independence plus separate endogamy term for each group (What Treiman refers to as QuasiIndependence) 







5 
Same as 4, plus Black White and Mexican Other Hispanic interactions (symmetric) 







6 
Crossings Model 







7 
Uniform Association Model 







8 
QuasiSymmetry Model 







9 
RC Model (fit with LEM) 







10 
Your best fitting model here 

























1) Fill in the above table, models 19, leave the 'notes' column blank for now. For model 5 the Black White and Mexican Other Hispanic terms are gender symmetric.
2) Verify that model 1, the 'constant' model is the comparison model for the likelihood ratio chisquare that Stata lists as the second line of output for each subsequent model. How do you interpret that chisquare test?
3) Does racial endogamy vary significantly between groups? What is the statistical test that answers that question?
4) In model 4 which is the group with the strongest ethnic or racial endogamy? Which group has the weakest endogamy? Is the difference between the strongest and weakest statistically significant?
5) Generate the predicted values for Model 5. Where do the predicted values and the actual values correspond exactly?
6) How do you interpret the coefficients for Black White and Mexican Other Hispanic intermarriage in Model 5?
7) If you add a gender specific dimension to Black White intermarriage in Model 5 (that is, if you create two blackwhite intermarriage terms instead of one gender symmetric BW term), is the difference between the two BW terms statistically significant? Does adding one term improve the goodness of fit of the data?
8) Make a new model, which consists of Model 2, the independence model, plus the gender symmetric Black White interaction term. Compare the resulting Black White interaction term to same term from Model 5. Why is it different? Think about how the comparison group is different in the two cases.
9) Interpret the crossing terms in the crossing model. Interpret the Endogamy levels in the Quasi Independence Model. Interpret the scores in the RC model.
10) Of models 19, which is the best fitting by BIC? Which fits best by the goodness of fit chisquare? Which fits best by Index of Dissimilarity?
11) Try to find a poisson regression, loglinear model that fits better than any of the models 19, by any criteria. Explain why your results. (Don’t worry too much if you cannot find a better model; there are not a lot of degrees of freedom left).
12) Redo models 18, but with negative binomial models, that is nbreg instead of poisson. Make a new table comparing all 8 loglinear models and all 8 nbreg models, based on the significance test of the alpha overdispersion factor. Include the chisquare goodness of fit of the loglinear models, and the chibar statistic reported by Stata for the nbreg models. Report on whether the nbreg model fits significantly better than its poisson twin, or not. Comment on the circumstances that lead to nbreg to fit better, versus no difference in fit (which would lead us to prefer the Poisson version). Comment on the value of the chibar test compared to the value of the poisson model goodness of fit chisquare.