Sociology 388, First Homework Assignment

Sociology 382

Homework 2

Important ideas: independence, degrees of freedom, goodness of fit, odds ratio

Consider the 2 datasets linked from my website:

A) Occupation by Race, USA 2000

		Race
		White	Non White
Occupational Class	Other	42,012	7,146
	White Collar	17,216	2,361

and

B) Intermarriage, LA 1990

	Wives
Husbands:	NH Black	Mexican	Other Hisp	All Others	NH White
Non Hisp Black	4074	63	32	42	215
Mexican	25	3947	143	95	1009
Other Hispanic	16	132	239	18	304
All Others	19	78	18	1022	360
Non Hisp White	103	1156	373	492	28453

1) For dataset A, calculate the log odds ratio and the standard error of the log odds ratio, using Excel. Is the log odds ratio significantly different from zero? What does that mean about the association between race and occupational class in America? Besides the statistical significance of the log odds ratio, do you think the magnitude of the effect is a large enough to be potentially socially significant effect, or not?

2) For dataset A, how is the log odds ratio for non-White representation in the White collar sector related to the log odds ratio for White representation in the White collar sector?

3) For BOTH dataset A and B, use excel to generate the 'Independence' Model. Without using any statistics, how close do you think the "independence" model is to the actual data for A and B? (any reasonable opinion is fine here).

4) When non statisticians talk about over representation, and under representation, they frequently talk in terms of observed and expected percentages. Use the 'Independence Model' (see Question 3) to generate expected percentage of non-Whites, and Whites in White Collar jobs. Then divide observed percentage by the expected percentage to get a crude measure of over or under-representation. How can you compare the measure for Whites and non-Whites? Can you think of any reasons why this method is less satisfactory than the odds ratio method?

5) Use Stata to generate the "Independence" model for both datasets A and B. How many terms are in the model? How many degrees of freedom are in the likelihood ratio chi-square test (Stata option poisgof after you have run the poisson regression). What does the likelihood ratio chi-square test tell you about how well the 'independence' model fits the data? Now use the tabulate command, with the lrchi2 option (and don't forget to use the weights as in [fweight=count]. How do these two measures of independence compare?

6) Using Excel and dataset A, find the log odds ratio of White representation in White Collar jobs, from the predicted values of the Independence model (see Question 3). How do you interpret this?

7) Use Stata to generate the 'saturated' model for dataset A, which is simply the "Independence" model plus one additional term. How many terms are in the model? How many degrees of freedom are in the likelihood ratio chi-square test? What is the value and standard error of the new interaction term? How do these values compare to what you calculated by hand in question 1? Use the predict command in Stata to generate predicted values from this model. How do the predicted values compare to the actual data? Explain why the predicted values fit the actual data so well.