HW 3, Soc 382


Rev: 2/12/2019



For reference: Treiman Chapter 12. See also, for reference, Hout Chapters 2-4; Agresti sections on loglinear models for two way contingency tables


Note:  What I refer to as the independence model, Hout refers to as the model of 'perfect mobility', and my Model 4, the Independence model plus one term for each diagonal cell, is Hout's Quasi- Perfect Mobility, or QPM, and Treimanís Quasi-independence model.  Model 3 with one term for all diagonal cells is what Hout refers to on P. 30 as QPM-C. 


You will find the LEM 1.0 software at https://jeroenvermunt.nl/



BIC (since it isn't defined in either text):  BIC= LRT- df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual degrees of freedom, and N is the sample size from the whole dataset.  The syllabus contains references that define BIC (Raftery 1986) and critique it (Weakliem 1999).  Lower BIC indicates better fit, and BIC < 0 indicates a model that is preferred to the saturated model.


ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity

50(abs(predicted/N)-(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.



Important ideas:  Goodness of fit measures, hypothesis testing.


Consider the Los Angeles intermarriage dataset:



Intermarriage, LA 1990








NH Black


Other Hisp

All Others

NH White

Non Hisp Black












Other Hispanic






All Others






Non Hisp White








Fill in the following Table


Model #

Model Description

Terms in model

Residual df

Goodness of fit Chi-square

Goodness of fit Chi-square P





Constant only









Independence Model









Independence plus single level of endogamy (same for all groups)









Independence plus separate endogamy term for each group (What Treiman refers to as Quasi-Independence)









Same as 4, plus Black- White and Mexican- Other Hispanic interactions (symmetric)









Crossings Model









Uniform Association Model









Quasi-Symmetry Model









RC Model (fit with LEM)









Your best fitting model here




























1) Fill in the above table, models 1-9, leave the 'notes' column blank for now.  For model 5 the Black- White and Mexican- Other Hispanic terms are gender symmetric.


2) Verify that model 1, the 'constant' model is the comparison model for the likelihood ratio chi-square that Stata lists as the second line of output for each subsequent model.  How do you interpret that chi-square test?


3) Does racial endogamy vary significantly between groups?  What is the statistical test that answers that question?


4) In model 4 which is the group with the strongest ethnic or racial endogamy?  Which group has the weakest endogamy?  Is the difference between the strongest and weakest statistically significant?


5) Generate the predicted values for Model 5.  Where do the predicted values and the actual values correspond exactly?


6) How do you interpret the coefficients for Black- White and Mexican- Other Hispanic intermarriage in Model 5?


7) If you add a gender specific dimension to Black- White intermarriage in Model 5 (that is, if you create two black-white intermarriage terms instead of one gender symmetric BW term), is the difference between the two B-W terms statistically significant? Does adding one term improve the goodness of fit of the data?


8) Make a new model, which consists of Model 2, the independence model, plus the gender symmetric Black- White interaction term.  Compare the resulting Black- White interaction term to same term from Model 5.  Why is it different?  Think about how the comparison group is different in the two cases.


9) Interpret the crossing terms in the crossing model. Interpret the Endogamy levels in the Quasi Independence Model. Interpret the scores in the RC model.


10) Of models 1-9, which is the best fitting by BIC?  Which fits best by the goodness of fit chi-square? Which fits best by Index of Dissimilarity?


11) Try to find a poisson regression, loglinear model that fits better than any of the models 1-9, by any criteria. Explain why your results. (Donít worry too much if you cannot find a better model; there are not a lot of degrees of freedom left).


12) Redo models 1-8, but with negative binomial models, that is nbreg instead of poisson. Make a new table comparing all 8 loglinear models and all 8 nbreg models, based on the significance test of the alpha over-dispersion factor. Include the chisquare goodness of fit of the loglinear models, and the chibar statistic reported by Stata for the nbreg models. Report on whether the nbreg model fits significantly better than its poisson twin, or not. Comment on the circumstances that lead to nbreg to fit better, versus no difference in fit (which would lead us to prefer the Poisson version). Comment on the value of the chibar test compared to the value of the poisson model goodness of fit chisquare.