HW 2, Soc 388
Due Thursday, October 18, in class
Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due. If you're stuck, email me. If you still can't figure it out, just do the best you can and don't panic.
Note: What I refer to
as the independence model, Hout refers to as the model of 'perfect mobility',
and my Model 4, the
Also note that some of the characteristics of the models
that Hout describes, especially the parts about coefficients summing to zero,
are not characteristics of the models, but rather characteristics of the way in
which some programs construct dummy variables.
Stata's built in xi function always constructs dummy variables with one
excluded category (equal to zero) in each variable. The userwritten Stata function desmat can
construct dummy variables in any number of ways, including the way hout
describes (this is the dev option), and by default the same way xi does it
(this is the
BIC (since it
isn't defined in either text): BIC= LRT
df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual
degrees of freedom, and N is the sample size from the whole dataset. The syllabus contains references that define
BIC (Raftery 1986) and critique it (Weakliem 1999).
ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity
50(abs(predicted/N)(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.
Important ideas: Goodness of fit measures, hypothesis testing.
Consider the Los Angeles intermarriage dataset:
Intermarriage, LA 1990

Wives 




Husbands: 
NH Black 
Mexican 
Other Hisp 
All Others 
NH White 
Non Hisp Black 
4074 
63 
32 
42 
215 
Mexican 
25 
3947 
143 
95 
1009 
Other Hispanic 
16 
132 
239 
18 
304 
All Others 
19 
78 
18 
1022 
360 
Non Hisp White 
103 
1156 
373 
492 
28453 
Fill in the following Table
Model # 
Model
Description 
Terms in
model 
Residual
df 
Goodness
of fit Chisquare 
Goodness
of fit Chisquare P 
BIC 
ID 
Notes 
1 
Constant
only 







2 
Independence
Model 







3 
Independence
plus single level of endogamy (same for all groups) 







4 
Independence
plus separate endogamy term for each group 







5 
Same as 4,
plus Black White and Mexican Other Hispanic interactions 







6 
Your best
fitting model here 

























1) Fill in the above table, models 15, leave the 'notes' column blank for now. For model 5 the Black White and Mexican Other Hispanic terms are gender symmetric.
2) Verify that model 1, the 'constant' model is the comparison model for the likelihood ratio chisquare that Stata lists as the second line of output for each subsequent model. How do you interpret that chisquare test?
3) Does racial endogamy vary significantly between groups? What is the statistical test that answers that question?
4) In model 4 which is the group with the strongest ethnic or racial endogamy? Which group has the weakest endogamy? Is the difference between the strongest and weakest statistically significant?
5) Generate the predicted values for Model 5. Where do the predicted values and the actual values correspond exactly?
6) How do you interpret the coefficients for Black White and Mexican Other Hispanic intermarriage in Model 5?
7) If you add a gender specific dimension to Black White intermarriage in Model 5, is it significant?
8) Make a new model 7, which consists of Model 2, the independence model, plus the gender symmetric Black White interaction term. Compare the resulting Black White interaction term to same term from Model 5. Why is it different? Think about how the comparison group is different in the two cases.
9) Of models 15, which is the best fitting by BIC? Which fits best by the goodness of fit chisquare? Which fits best by Index of Dissimilarity?
10) Find a model that fits better than model 5 by either BIC or the goodness of fit chisquare.