Soc 388, notes and links


Michael J. Rosenfeld
Assistant Professor

Department of Sociology
Stanford University

450 Serra Mall

Building 120

Stanford, CA 94305
(650) 723-3958

return to M. Rosenfeld home page

return to Sociology dept page
return to Stanford University page


Notes providing orientation to the class:

* A brief introduction to Poisson and Chisquare distributions is here, along with a companion Excel file. We will be discussing these towards the end of the class.

* Notes on where loglinear models fit in to the wider field of regression models

* A few introductory notes about how to interpret the coefficients of loglinear models.

* Also Note that I have some very basic introductory materials about Stata available at my home page, under Sociology 180. Here is a link to that introduction.

  * A comprehensive Excel file that covers the first 5 classes and some topics we will return to later in the class.

Practical Notes:

Note: if you want to download a file, but your browser tries to display rather than download, try control-click for Mac or right-click for windows.

Also Note: The stata data files I create in class are created in Stata Version 7, for Mac. You may have difficulty opening those datasets using Stata 6 or an earlier version. You should be able, however, to download the Excel file, and use the Excel file to re-create the dataset. The Stata log files are just text files. Any text editor should be able to open them once you've downloaded them. If the log file looks funny, you may have to change the font to a proportionally spaced font, like courier.

Article Links and Literature notes (All PDF format. External links require Stanford library access or use of proxy server if off campus):
* On the subject of BIC, see Adrian Raftery 1995 in Sociological Methodology singing the praises of BIC, and an interesting critique of BIC by David Weakliem in the 1999 Sociological Methods and Research.
* On the subject of sampling zeros, structural zeros, and adding a constant to every cell to deal with sampling zeros (especially in saturated models), see Agresti (2002: 70-72; 391-400), and Leo Goodman (1970) in JASA.

* On the subject of weights in loglinear models, see Clogg and Eliason (1987) SMR "Some Common Problems in Loglinear Analysis."

* On the subject of negative binomial regression, see Long's 1997 book, and see also articles by Gary King in the 1989 AJPS, and Cameron and Trivedi 1986.

* Some loglinear modeling papers that we will be discussing in class:
* Rosenfeld (2005) AJS, "Critique of Exchange Theory"
* Rosenfeld (forthcoming) "Endogamy in Comparative perspective"
* Qian (1997) Demography "Breaking the Racial Barriers"
* Mare (1991) ASR "5 Decades of Educational Assortative Mating"
* Bearman (1997) AJS "Generalized Exchange"
* Lichter (1988) AJS "Racial Differences in Under Employment"



Datasets also available via ftp at

* frogs in Stata 8 format
* the LA intermarriage dataset for Homework 1 is available in Stata 8 format; a newer numeric version of the LA intermarriage data (also Stata 8) is also available.
* the 4X4 educational intermarriage dataset from Class 4 and 5, in Stata 8 format.

* Agresti's little Death penalty dataset in Stata 8 format.
* the 1970-1990 intermarriage data for homework 3 available in Stata 6 and Excel format
* the 1980-1990 intermarriage data from Qian (1997) that we will be using in classes 7-10 (stata 6 format and excel format).
* Rosenfeld's data from 2005 AJS.
* Clogg and Eliason's data for proper use of weights.

IMPORTANT NOTE about desmat: We will be using the user-written 'desmat' routine to generate dummy variables, instead of the built-in 'xi' routine. Here's how to get desmat up and running.

1) IF YOU'RE USING THE SOCIOLOGY PC LAB, 'desmat' is already installed

2) IF YOU'RE USING YOUR OWN COMPUTER, you simply need to download the files once. Here's how. When your machine is connected to the internet, go to Stata and in the command line type:

ssc install desmat, replace

3) IF YOU'RE USING ANOTHER COMPUTER LAB, you will need to prevail on your system administrators to install 'desmat'. It is freeware. They will need to install 'desmat' using the command:

ssc install desmat, replace and you will need to figure out how to tell your local machine to access the files. The command

sysdir list will tell you where Stata thinks the additional or personal ado files should be (they'll be listed under 'plus' or 'personal' directories).

4) THE POTENTIAL DIFFICULTY WITH #3 MAY CONVINCE YOU TO BUY A LICENSE TO STATA IF YOU DON'T ALREADY HAVE ONE. Students can now buy a full copy of intercooled Stata for $129, or a one year license for $89. Go to


LEM: We Won't be using LEM until the end of the quarter. LEM for windows can be obtained from the following address. Be sure to install LEM in your root directory, not in C:\Program Files.

Class Notes Fall 2007

* Class 2 log file (redone after the fact because I forgot to turn the log on when I restarted Stata...)

* Class 3 log file.

* Class 4 log file, beginning the goodness of fit analysis of the 4X4 educational intermarriage dataset

* Class 5 log file, finishing the goodness of fit of the 4X4 ed intermarriage dataset. See the comprehensive Excel file for summary statistics.

* Class 6 log file, simply how to calculate BIC and ID.

* Class 7 log file, how to contract and manage datasets.

* Class 8 log, mostly on the joys of deviation coding for dummy variables.

* Class 9 log, examining model fit and degrees of freedom with Qian's multivariate data.

* Class 10 log, mostly about stepwise regression.

* Class 11 log, about logistic and poisson models for the death penalty data, plus dealing with zeros.

* Class 13 log, about scores and linear-by-linear association

* An example LEM session for RC model for educational intermarriage.

* Class 17 log, about loglinear and nbreg, using Rosenfeld's 2005 AJS data.

* Class 18 log, on weights using clogg and eliason's data.


Archived CLASS NOTES, Fall 2005:

* Class 2 log file (our first loglinear models with Stata).

* Class 3 log file.

* Class 4 log file.

* Class 5 log file.

* Class 6 log file.

* Class 7 log file.

* Class 9 log on building multivariate models.

* Class 10 log on logistic and loglinear models for the death penalty data.

* Class 11 log on stepwise regression and r+c models for ordinal data.

* A brief class 12 log on the practical merits of 'difficult' maximization of likelihood functions.

* A class 13 log on how to incorporate continuous variables into your loglinear models.

* Class 14 log on how to incorporate weights into your loglinear models, using Clogg and Eliason's data (stata format- download then open).



Archived CLASS NOTES, Fall 2003:

* Class 2 log file (better looking now)

* Class 3 log file.

* Class 4 log file is here. See below, last year's notes, if you want to download a stata version of the HW2 dataset, rather than copying it from the excel file.

* Class 5 log file is here. We also made some reference in class to last year's class 5 log (see below) on the subject of how to handle and recognize text versus numeric coding of variables.

* Class 6 log file is here (links to the multivariate datasets at the top of the page)

* Class 8 log file is here.

* Class 10 log file, on death penalty data and the confluence between logistic and loglinear models is here. See also the comprehensive Excel file which has been updated to include a worksheet about this.

* Class 11 log file, on QS models and stepwise estimation.

* Class 12 log on merging datasets and combining categorical and continuous variables.

* Class 14 log on exact tests and on Pearson residuals. See also the updated comprehensive excel file.

* Class 15 log on using weights in loglinear models, makes extensive reference to Clogg and Eliason (1987), here's their dataset.

* Class 16, log for negative binomial regression.

* Class 17, We will discuss coefficients and what to do with them. There's a brief look at the matrix underpinnings of the linear model here. We also discussed model coefficients (using HW3 dataset again), see updated comprehensive excel file. Also see a stata log that shows how to create a modified XB count for just the interactions you want. You'll need to know some basic matrix algebra to understand it.



Even older ARCHIVED CLASS NOTES, Fall 2002:

* Introductory notes on log linear models. An Excel file that covers the first 5 classes.

* Class 2. Uses the same excel dataset as in class one (see above link), also class 2 STATA log, and the frog dataset.

* Class 3. I've updated the excel file a bit (see class 1), and there's a new stata log file.

* Class 4. I've updated the excel file again (see link in class 1). I also have links to the Stata log for the first analysis of the educational intermarriage dataset, and I have the stata dataset here as well (to save you a minute from reloading it from the excel file).

The dataset for HW2 is here, in STATA format (in this version of the dataset the ethnic groups are coded 1-5 and the labels carry the ethnic group names). If you get an error in Stata when you try to load it, update your version of STATA 7 using the method described above, on this page. Here is a second, nearly identical version of the dataset, but with actual string variables for ethnicity (meaning the ethnic groups are coded "Black", "White", etc).

* Class 5. Because Class 5 was plagued by some technical difficulties, I re-created the STATA session in order to give you a log that actually covers the material we did in class. Here is that log. The ed by ed dataset is still linked from class 4 (or from the excel file we have been using), and the racial intermarriage dataset is linked above in STATA format, or you can find it in the HW1 answers excel file.

Here is a link to an updated excel file that includes further discussion of fitting models to the educational intermarriage dataset, and how to interpret the coefficients.

* Class 7 log, with info on stepwise and other things is here.

* Class 8 log, with info on contracting datasets, and dealing with zeros, is here.

*Class 9 log, some comments on multi dimensional models, and on where and how these models fit the data.

* Class 10 log, mostly on residuals. Note: This log got updated during class 11, with an extended idea of how you can look at residuals to find the cells that have the worst fits, and how to identify the problem and think about solutions.

* Class 11 log on different things you can do with weights, using data from Clogg and Eliason's paper.

* Class 12 log, which discusses a comparison between logit and loglinear models, as well as a discussion of goodness of fit chisquare tests, including the exact test.

* Class 13. Some discussion of the coefficients in log linear models, and what you can say about them. Notes are here, and see also previous excel files which discuss how to combine the coefficients in a table for tabular or graphical presentation.

* Class 14. Links to a PDF version of my notes on Poisson and Chisquare distributions, and an Excel file with examples.

* Class 15. Some comments on negative binomial models and robust standard errors.