Introduction to Data Analysis for Sociology Graduate Students

rev: 10/22/2018


Fall Quarter, 2018

Mondays and Wednesdays


Bldg 160, Room 329


Lab/Section once a week for homework and once a week for projects


Michael J. Rosenfeld


Department of Sociology

Building 120 room 124

The class website is my personal Stanford website

Office Hours by appointment



Meghan Warner

Amy Johnson




            In this class you will teach yourself basic statistics including regression, how do statistical analysis, and how to find flaws and problems with statistical analyses.

            In the process of learning about data analysis you will also learn about demography and stratification in the U.S., because the dataset is the Current Population Survey of March, 2000, which is a nationally representative survey of more than 60,000 households, with lots of information about race, gender, income, occupation, place of residence, and so on.  You'll also learn how to use one of the most powerful and flexible tools for data analysis, the statistical software STATA.

            Most class materials will be posted on my website ( We will use an online tool for collecting homework and returning homework, collecting and returning presentation drafts, collecting presentation slides, posting grades, and sending group emails.


Readings and Grading Policy


Books required (available at Stanford Bookstore):

* Tufte, Edward. 2001. The Visual Display of Quantitative Information. Graphics Press. ISBN-10: 0961392142. $30

* Treiman, Donald. 2009. Quantitative Analysis: Doing Social Research to Test Ideas. Jossey-Bass. ISBN-10: 0470380039. $59

* Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions Fail- But Some Don’t. ISBN-10: 0143125087. $16



Recommended Books:

* Mathematical Statistics and Data Analysis, by John Rice, Duxbury Press, 3rd edition 2006, ISBN-10: 0534399428. $175

* Freedman, David, Robert Pisani, and Roger Purves. 2007. Statistics. Fourth Edition. W.W. Norton. $125. ISBN-10: 0393929728



The most important readings for the class are the Excel files, Stata logs, and PDF documentation posted on my website. Aside from the Tufte book, which we will be going over page-by-page in class, the other books are all supplementary. The Nate Silver book we probably won’t discuss until Soc 382 in Winter quarter. That is, you don’t need the books. This is briefly why you should own the books anyway:

* Treiman is an excellent book about social statistics (using Stata), which covers some practical aspects of data analysis that we won’t get to in this class. Treiman’s book was written for Sociology PhD students.

* Freedman is a classic introductory text about statistics, with no math, but with very good plain English explanations. If you don’t have a math background, Freedman’s explanations may be helpful to you. If you do have a math background, the Freedman may help you explain statistics to other people. And if you end up teaching undergraduate statistics in the future, you may be teaching from Freedman.

* Rice is a classic introduction to statistics for readers who have at least a modest familiarity with calculus. Rice offers outlines of proofs, a fairly deep discussion of probability theory, and lots of great problems you can work through on your own. Rice is a great reference book that you should have on your shelf if you plan on doing any data analysis.

* Silver is a brilliant book about some practical applications and mis-applications of statistical thinking in the everyday world.



Software Required

* You will need Stata in order to do the homework for Soc 381. You have several options:

1) The least easy and the least palatable is to use Stata over Unix. This is free but very cumbersome.

2) Stata is installed in the graduate student computer cluster, running on Windows PCs. This is a good solution, except that you won’t have access to Stata in class or when you are home.

3) The option that offers the most convenience, but also costs the most, is for you to buy a perpetual license for Intercooled (IC) Stata, current version is 15. Purchase a perpetual license for $225 (the 1 year license won’t serve you for long enough). If you have some extra cash, consider buying the Stata SE license, which is $395, but which allows larger data sets to be loaded and manipulated. The software comes with a small introduction to Stata book. Stata/IC is sufficient for all the homework in this class, but for your own projects Stata/SE might have advantages. Stata/IC can be upgraded to Stata/SE later. Don’t bother buying Stata’s massive printed reference book collection for this class. I will teach you the Stata commands that you need to know, and the Stata online help is very good.

Note that the Graduate Student Computer Lab may run an earlier version of Stata. Different versions of Stata work pretty much the same way.


Students with Disabilities:

Students with Documented Disabilities: Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL:          



This Course justifies an additional unit of credit, beyond what would be expected based on the typical assignment of class time and outside work. An additional unit represents, on average, 30 additional hours of work expected of a student during the quarter, devoted to homework and to the preparation of the student’s research presentation.


Computer Use Policy:

* Computer use by students during class is strictly limited to following along with the data analysis examples being presented by the professor.




4 homeworks


Regular section participation


Your data project outline


In-class presentation (data analysis of dataset of your own choosing)


Final Exam





Project and Reading Assignment Timeline

(Note: my chapter and section headings for Rice are from the 2nd edition; the same material should be in the 3rd edition but you may have to look for it).




Class lecture Goals

READING (Readings in bold are required and will be discussed specifically in that class. Other readings are supplementary)



Sept 24

Introduction to Stata and Data Analysis Section


Basics of descriptive data analysis using STATA

 Read Treiman’s chapters 1-4. Read Rosenfeld’s online Stata guide




Hand out CPS HW #1


Sept 26

Observational Studies and their limitations

Freedman Ch 2, 4




Work on HW 1 and on using STATA









Oct 1

Error and bias

Freedman Ch 6

Silver Ch1, 4



Oct 3

Probability sampling, Sample size and power, and standard errors

Freedman Ch 20;

read also Treiman Ch 9;

Rice, ch. 6 on “Distributions derived from the Normal Distribution”

HW #1 due

Hand out HW#2



Stata, and HW 2









Oct 8

More on sample size and power.

Freedman Ch 21

Rice, section 11.3 on “Comparing Paired Samples”



Oct 10

Introduction to regression with STATA

Yom Kippur; class attendance optional

Freedman Chs 9, 10

Treiman, Ch 5-6

 HW #2 Due

Hand out HW#3



Work on STATA, discuss the issues in HWs 2 and 3









Oct 15

More on regression with STATA, interpreting coefficients

Freedman, Ch 11;

Rice ch. 14, “Linear Least Squares”


Oct 17

Problems with and difficulties in using regression, Graphing.

Freedman Ch 12




Work on STATA, discuss the issues in CPS HW #3









Oct 22

More limitations of regression analysis




Oct 24

Regression analysis



Work on STATA








Oct 29

 Logistic regression

Treiman chapter 13

Rice section 8.5 “The Method of Maximum Likelikhood”

HW #3 due

Hand out HW #4


Oct 31

Other topics, including logistic regression and the likelihood ratio test

Treiman p. 264-276;

Rice section 9.3 the “Neyman-Pearson Lemma”, 9.4 on “Confidence Intervals and Hypothesis Tests” and section 9.5 on “Generalized Likelihood Ratio Tests”




work on HW 4









Nov 5

Presentation of data (class no longer cancelled)

Tufte, read the entire book (required)


Nov 7


 The Jasso v. Udry debate is required reading:

1)Jasso's original article on coital frequency. 2) Kahn and Udry's critique. 3) Jasso's response

See also: Silver, Ch 2 and 6



Work on HW 4 and projects








Nov 12

Some additional, and advanced topics


 HW #4 due


Nov 14

Some additional, and advanced topics


Presentation Proposals Due



Work on Projects









 Nov 19-23

Thanksgiving break









Nov 26

Student Presentations




Nov 28

Student Presentations





Work on Projects








Dec 3

Student Presentations




Dec 5

Final Exam Review





Exam review







Final Exam


in class Final Exam at the regularly scheduled time and place: Thursday, Dec 13, 8:30A