Sociology 382

Homework 2

Important ideas: independence, degrees of freedom, goodness of fit, odds ratio

Consider the 2 datasets linked from my website:

A) Occupation by Race, USA 2000

 Race White Non White Occupational Class Other 42,012 7,146 White Collar 17,216 2,361

and

B) Intermarriage, LA 1990

 Wives Husbands: NH Black Mexican Other Hisp All Others NH White Non Hisp Black 4074 63 32 42 215 Mexican 25 3947 143 95 1009 Other Hispanic 16 132 239 18 304 All Others 19 78 18 1022 360 Non Hisp White 103 1156 373 492 28453

1) For dataset A, calculate the log odds ratio and the standard error of the log odds ratio, using Excel.  Is the log odds ratio significantly different from zero?  What does that mean about the association between race and occupational class in America?  Besides the statistical significance of the log odds ratio, do you think the magnitude of the effect is a large enough to be potentially socially significant effect, or not?

2) For dataset A, how is the log odds ratio for non-White representation in the White collar sector related to the log odds ratio for White representation in the White collar sector?

3) For BOTH dataset A and B, use excel to generate the 'Independence' Model.  Without using any statistics, how close do you think the "independence" model is to the actual data for A and B? (any reasonable opinion is fine here).

4) When non statisticians talk about over representation, and under representation, they frequently talk in terms of observed and expected percentages.  Use the 'Independence Model' (see Question 3) to generate expected percentage of non-Whites, and Whites in White Collar jobs.  Then divide observed percentage by the expected percentage to get a crude measure of over or under-representation.  How can you compare the measure for Whites and non-Whites?  Can you think of any reasons why this method is less satisfactory than the odds ratio method?

5) Use Stata to generate the "Independence" model for both datasets A and B.  How many terms are in the model?  How many degrees of freedom are in the likelihood ratio chi-square test (Stata option poisgof after you have run the poisson regression).  What does the likelihood ratio chi-square test tell you about how well the 'independence' model fits the data?  Now use the tabulate command, with the lrchi2 option (and don't forget to use the weights as in [fweight=count].  How do these two measures of independence compare?

6) Using Excel and dataset A, find the log odds ratio of White representation in White Collar jobs, from the predicted values of the Independence model (see Question 3).  How do you interpret this?

7) Use Stata to generate the 'saturated' model for dataset A, which is simply the "Independence" model plus one additional term.  How many terms are in the model?  How many degrees of freedom are in the likelihood ratio chi-square test?  What is the value and standard error of the new interaction term?  How do these values compare to what you calculated by hand in question 1? Use the predict command in Stata to generate predicted values from this model. How do the predicted values compare to the actual data? Explain why the predicted values fit the actual data so well.