HW 3, Soc 388, updated Oct 11.
Due Tuesday, Oct 30
Late homeworks will generally not be accepted, because I will post answers to my website soon after the homework is due. If you're stuck, email me or the TA.
NOTE: All homeworks should include an edited STATA log.
Previous Reading Assignments: Hout Chapters 14; Agresti Ch 12, 6.
New Reading Assignment: Agresti, Ch 3
Once again, (since it isn't defined in either text): BIC= LRT df(ln(N)), where LRT is the goodness of fit chisquare, df is the residual degrees of freedom, and N is the sample size from the whole dataset. The syllabus contains references that define BIC (Raftery 1986) and critique it (Weakliem 1999). Lower BIC indicates better fit, and BIC < 0 indicates a model that is preferred to the saturated model.
ID, or Index of Dissimilarity, 0≤ ID≤ 100 is a simple measure that describes what percentage of the predicted counts of a model would have to be changed to reach the actual data. ID=sum (over all cells) of the quantity
50(abs(predicted/N)(actual/N)), where N is the sample size, predicted are the predicted values of the model, and actual are the actual cell counts.
Important ideas: Goodness of fit measures, hypothesis testing, inference across many dimensions, different kinds of controls, hierarchical variables.
The data are available from my website, as well as my public folder via ftp (/afs/ir/users/m/r/mrosenfe/public) under the name "708090 MR intermar.dta" (Stata ver 6) or "708090 MR intermar.xls" if you'd rather start with the excel file and copy it into Stata.
The data have 225 cells, and 5 variables (not including count). There 649,821 couples in the dataset (it's intermarriage data, surprise surprise). The data consist of married people age 2029 at the time of the census. The variables are meth (husband's ethnicity) and feth (wife's ethnicity), with the same 5 categories we have seen before (non Hispanic Black, non Hispanic White, Mexican, Other Hispanic, non Hispanic Other). There is a variable for census year (70, 80, and 90), and there is a variable for nativity of each spouse (born in the US vs Foreign born). The dataset includes 3 of the possible 4 combinations of nativity; couples that are both foreign born are excluded. The number of cells= 5*5*3*3=225.
In the following table, BW is the gender symmetric Black White interaction; MOh is the gender symmetric Mexican Other Hispanic interaction; ethintdm is the dummy variable that treats all 5 kinds of ethnic endogamy the same, ethintct is the categorical variable that treats each kind of ethnic intermarriage differently.
In model descriptions, " year*meth*mgen" is a hierarchical description which mean that the interaction of the 3 variables, as well as all combinations of dual interactions and single variables are included. I'll explain more about this in class.
Note: in Model 7a, the '@' in front of year indicates (to
desmat) that year should be treated as a continuous variable there.
Fill in the following Table
Model # 
Model
Description 
Terms in
model 
Residual
df 
Goodness
of fit Chisquare 
Goodness of
fit Chisquare P 
BIC 
ID 
1 
Constant
only 






2 
year*meth year*feth 






3 
year*meth*mgen year*feth*fgen 






4 
year*meth*mgen year*feth*fgen BW, MOh 






5 
year*meth*mgen year*feth*fgen ethintdm 






6 
year*meth*mgen year*feth*fgen ethintct 






7a 
year*meth*mgen year*feth*fgen ethintct*@year 






7b 
year*meth*mgen year*feth*fgen ethintct*year 






8 
year*meth*mgen year*feth*fgen ethintct*year BW MOh 






9 
Your best
fitting model here 














1) Fill in the above table, models 18
2) Does racial endogamy vary significantly between groups? What is the statistical test that answers that question?
3) Does racial endogamy vary significantly over time? More so for some groups than for others?
4) Does US nativity effect racial endogamy? Describe the model(s), and the results you need to answer this question.
5) Based on models 18, which would you say is a more powerful force in the marriage market racial endogamy or the division between Blacks and Whites? Why?
6) Which of the models 18 fits the best by LRT and by BIC? Do any of them fit reasonably well?
7) What is the difference between treating year as a continuous vs categorical variable in interactions with ethnic endogamy? How do models 7a and 7b differ? How do you interpret this difference?
8) Construct a model that fits better (by BIC or LRT) than any of the models 18. What have you added to the previous models?
9) Now here are some more abstract questions about a hypothetical dataset with 3 variables: A (5 categories) B(4 Categories) and C (3 categories). Total number of cells is 5*4*3=60. Fill in the following table.
Model # 
Model
Description 
Terms in
model 
Residual
df 
1 
A 


2 
A,B 


3 
A*B 


4 
A*B,C 


5 
A*B, B*C,
A*C 


6 
A*B*C 

